4J" 


SEP  1  1 1989 


lASSIST 

Q        UARTERL        Y 


VOLUME  13 


Spring  1969 


NUMBER  1 


Priming  compliments  of  The  Rand  Corporation 


lASSIST 

QUARTERLY 


VOLUME  13  NUMBER  1 


FEATURES 


Spring  1989 


K3 


The  1985  Canada  Social  Survey  Program:  A 
review 

by  Charles  Humphrey 

The  use  of  microcomputers  for  demographic 
analysis:  An  overview  of  options  for  tne  novice 
user 

by  Diane  Crispell 

The  case  for  software  as  documentation 

by  David  Bearman 

fs  data  redundancy  the  price  archivists  will  pay 
or  adequate  documentation 

by  Margaret  Hedstrom 

History  and  the  data  archives 

by  Hans  Jorgen  marker 

Small-area  census  data  services  by 
microcomputer:  Applications  of  the  REDATAM 
system  In  Latin  America  and  the  Caribean 

by  A.  Conning,  A.  Silva,  and  L.  Finnegan 


DEPARTMENTS 


INTERNATIONAL  ASSOCIATION 
FOR  SOCIAL  SCIENCE 
INFORMATION  SERVICE  AND 
TECHNOLOGY 

ASSOCIATION 

INTERNATIONALE  POUR  LES 
SERVICES  ET  TECHNIQUES 
D'INFORMATION  EN  SCIENCES 
SOCIALES 


wm 


lASSIST  90 


Editorial    Information 


The  lASSIST  QUARTERLY  represents  an  international  cooperative  effort  on  the  part  of  individuals 
managing,  operating,  or  using  machine-readable  data  archives,  data  libraries,  and  data  services.  The 
QUARTERLY  reports  on  activities  related  to  the  production,  acquisition,  preservation,  processing, 
distribution,  and  use  of  machine-readable  data  carried  out  by  its  members  and  others  in  the 
international  social  science  community.  Your  contributions  and  suggestions  for  topics  of  interest  are 
welcomed.  The  views  set  forth  by  authors  of  articles  contained  in  this  publication  are  not  necessarily 
those  of  lASSIST. 

Information  for  Authors 

The  QUARTERLY  is  published  four  times  per  year.  Articles  and  other  information  should  be 
typewritten  and  double-spaced.  Each  page  of  the  manuscript  should  be  numbered.  The  first  page 
should  contain  the  article  title,  author's  name,  affiliation,  address  to  which  correspondence  may  be 
sent,  and  telephone  number.  Footnotes  and  bibliographic  citations  should  be  consistent  in  style, 
preferably  following  a  standard  authority  such  as  the  University  of  Chicago  press  Manual  of  Style  or 
Kate  L.  Turabian's  Manual  for  Writers.  Where  appropriate,  machine-readable  data  files  should  be 
cited  with  bibliographic  citations  consistent  in  style  with  Dodd,  Sue  A.  "Bibliographic  references  for 
numeric  social  science  data  files:  suggested  guidelines".  Journal  of  the  American  Society  for 
Information  Science  30(2):77-82,  Marcfi  1979.  If  the  contribution  is  an  announcement  of  a 
conference,  training  session,  or  the  like,  the  text  should  include  a  mailing  address  and  a  telephone 
number  for  the  director  of  the  event  or  for  the  organization  sponsoring  the  event.  Book  notices  and 
reviews  should  not  exceed  two  double-spaced  pages.  Deadlines  for  submitting  articles  are  six  weeks 
before  publication.  Manuscripts  should  be  sent  in  duplicate  to  the  Editor: 

Walter  Piovesan,  Research  Data  Library,  W.A.C.  Bennett  Library,  Simon  Eraser  University 
Burnaby,  B.C.,  V5A  1S6  CANADA  (01)604/291-4349  E-Mail:  USERDLIB@SFU.BITNET 

Book  reviews  should  be  submitted  in  duplicate  to  the  Book  Review  Editor: 

Daniel  Tsang,  Main  Library,  University  of  California  P.O.  Box  19557,  Irvine,  California 
92713  USA.  (01)714/856-4978  E-Mail:  DTSANG@ORION.CF.UCI.EDU 


Key  Title :  Newsletter  -  International  Association  for  Social  Science  Information  Service  and 
Tectinology 

ISSN  -  United  States:  0739-1 137  Copyright  1985  by  lASSIST.  All  rights  reserved. 


iassist   quarterly 


The  1985  Canada  General 
Social  Survey: 

A  Review 


by  Charles  Humphrey' 
Data  Library  and  Statistics  Section 
University  Computing  Systems 
University  of  Alberta 


Introduction 

For  social  researchers,  the  character  of  a  general  social  survey  has  largely  been  shaped  by  the  series 
of  omnibus  surveys  conducted  since  1972  by  the  National  Opinion  Research  Center  (NORC).    These 
national  surveys  of  the  U.S.  population  have  served  to  monitor  public  dispositions  toward  a  wide 
range  of  social  issues  using  attitudinal  and  behavioural  indicators.    At  the  beginning  of  this  decade, 
other  countries  developed  their  own  national  surveys  which  were  similar  in  scope  to  NORC's  General 
Social  Survey.    Two  examples  are  the  ALLBUS  survey  in  the  Federal  Republic  of  Germany,  which 
was  begun  in  1980,  and  the  British  Social  Attitudes  Survey,  which  was  introduced  in  1983.    An 
international  trend  of  this  type  of  research  was  well  underway  when  Statistics  Canada  administered 
their  first  General  Social  Survey  in  1985.    The  following  discussion  describes  the  approach  taken  by 
Statistics  Canada  and  discusses  some  of  the  shortcomings  of  the  1985  survey. 


'Presented  at  the  International  Association  for  Social  Science  Information  Service  and  Technology 
(IASSIST)  Conference  held  in  Washington,  D.C.,  May  26-29,  1988 


Spring   1989 


4  —  ,  lassist   quarterly 


Canada's  Approach  to  a  General  Social  Survey 

One  preconception  of  general  social  surveys  is  that  they  provide  a  wide-lens  snapshot  of  society  at  a 
specific  time.    The  picture  is  expected  to  contain  both  measures  of  the  major  social  issues  in  society 
and  quality  of  life  indicators.    Such  a  survey  is  general  in  that  it  captures  a  comprehensive  image  of 
society.    This  does  not,  however,  describe  the  nature  of  the  Canada  General  Social  Survey  (GSS). 

Statistics  Canada  initiated  a  general  social  survey  to  track  social  and  economic  trends  being  missed  by 
existing  national  statistics  programs.    Thus,  the  survey  arose  to  fill  gaps  in  the  federal  government's 
statistical  information  system.    Consequently,  the  survey  was  general  only  to  the  extent  that  it 
gathered  data  that  were  not  being  collected  under  specifically  established  programs. 

Two  objectives  were  stated  for  the  GSS:  [f]irst,  to  gather  data  on  social  trends  in  order  to  monitor 
changes  in  Canadian  society  over  time;  and  secondly,  to  provide  information  on  specific  policy  issues 
of  current  or  emerging  interest." 2 

These  objectives  were  put  into  operation  by  dividing  the  content  of  the  survey  into  three  areas:  core, 
focus  and  classification.    The  core  content  was  intended  to  address  the  initial  objective,  that  is,  to 
monitor  social  trends.    The  focus  content  was  directed  at  the  second  objective,  which  was  to  provide 
information  about  a  specific  policy  issue.    Finally,  the  classification  data  consisted  of  standard 
demographic  information. 

In  practice,  both  the  core  and  the  focus  content  were  topical  rather  than  comprehensive.    In  1985, 
the  subject  for  the  core  component  was  personal  health  care  and  included  items  about  health  status, 
well-being,  health  problems,  smoking  habits,  alcohol  use,  sleep  behaviour,  physical  activity,  and  use 
of  health  care  services.    The  content  for  the  focus  component  dealt  with  social  support  for  the 
elderly. 

Another  aspect  of  the  GSS  that  differed  from  other  general  social  surveys  was  its  sampling  design. 
The  target  population  contained  respondents  younger  than  the  minimum  age  allowed  in  other  general 
social  surveys.    The  GSS  sampled  people  who  were  fifteen  years  of  age  or  older.'  One  positive 
consequence  is  that  the  personal  health  practices  of  teenagers  can  be  examined,  especially  their 
drinking  and  smoking  behaviour. 

Not  only  did  the  GSS  permit  a  slightly  younger  target  population,  it  also  employed  a  mixed  sampling 
methodology.    Two  different  random-digit  dialing  techniques  were  used  to  select  respondents  who 
were  under  the  age  of  sixty-five.    People  chosen  by  one  of  these  two  methods  were  interviewed  by 
telephone.    A  third  sampling  method  was  applied  to  select  people  who  were  sixty-five  or  older.    This 


^Statistics  Canada,  General  Social  Survey:  Health  and  Social  Support,  1985,  "Public  Use  Micro  Data 
File  Documentation  and  User's  Guide",  December,  1986,  p  4. 

'Several  aspects  of  the  GSS  follow  conventions  of  the  Canadian  Labour  Force  Survey,  which,  for 
example,  uses  the  same  minimiun  age  in  its  surveys.    Other  survey  practices  that  are  common  to 
both  of  these  Statistics  Canada  projects  are  mentioned  elsewhere  in  this  paper,  including  shared 
sampling  frames  and  similar  question  styles.    The  stamp  of  the  Canadian  Labour  Force  Survey  on 
the  GSS  is  pronounced  and  should  be  viewed  as  the  infiuence  of  a  common  administration. 

Spring   1989 


iassist   quarterly 


group  -  which  was  oversampled  -  was  drawn  from  the  sampling  frame  used  for  the  Canadian 
Labour  Force  Survey.    Those  chosen  from  this  list  received  personal  interviews.    The  end  result  for 
secondary  users  of  the  data  is  that  a  weight  variable  is  necessary  to  correct  for  these  different 
sampling  techniques  and  the  over  sampling. 

To  summarize,  the  GSS  deviated  from  the  design  of  other  national  general  social  surveys.    The 
content  of  the  GSS  was  topical  rather  than  comprehensive,  and  it  applied  a  complex  sampling 
methodology  mixing  three  different  sampling  techniques  and  two  modes  of  interviewing  -  in  person 
and  telephone. 


Cooperative  Federalism  and  a  Large  Sample  Size 

Today's  political  and  social  influences  upon  survey  research  are  not  always  apparent;  yet,  such 
research  is  unquestionably  shaped  by  these  forces.    One  particular  feature  of  the  Canadian  polity 
seems  to  have  left  its  mark  upon  the  1985  General  Social  Survey. 

A  Canadian  journalist  once  wrote  that  "Canada  is  the  only  country  in  the  world  where  you  can  buy 
a  book  on  federal-provincial  relations  at  an  airport"'  This  comment  symbolizes  an  ongoing  political 
debate  about  the  division  of  powers  between  the  federal  and  provincial  governments  that  has 
persisted  since  Confederation  in  1867.    One  consequence  of  this  debate  has  been  the  emergence  of 
unique  institutions,  including  royal  commissions  investigating  governmental  powers.  First  Ministers' 
conferences  addressing  social  and  political  issues,  and  government  departments  dedicated  to 
inter-  governmental  affairs.    During  a  recent  period,  the  debate  assumed  a  new  tenor  which  became 
known  as  cooperative  federalism.    The  heart  of  cooperative  federalism  is  the  process  of  continuous 
consultation  between  federal  and  provincial  bureaucrats  and  politicians. 

One  feature  of  the  1985  GSS  that  appears  to  have  been  influenced  by  cooperative  federalism  is  its 
sample  size.    The  total  sample  for  the  telephone  component  was  9,675  households  and  from  this 
base,  8,070  interviews  were  completed.    For  the  personal  interview  component,  the  sample  size  was 
3,620  from  which  3,130  interviews  were  completed.^  Thus,  the  total  number  of  completed  interviews 
was  11,200.    In  the  United  States  with  a  population  ten  times  larger  than  Canada's  base,  the  total 
number  of  cases  for  one  of  NORC's  General  Social  Surveys  is  around  1,500.    Canada's  1985  GSS 
alone  had  the  equivalent  number  of  cases  to  seven  years  of  NORC  studies. 

Why  was  the  1985  GSS  sample  size  so  large  by  comparison  to  other  national  general  social  surveys? 
When  asked  this  question,  a  spokesman  for  the  GSS  provided  two  answers.'  First,  the  sampling 
frame  for  the  senior  citizen  portion  was  taken  from  the  Canadian  Labour  Force  Survey.    The 
administrators  of  the  GSS  maintained  that  constructing  a  new  sampling  frame  would  be  too 


'Michael  Valpy,  The  Globe  and  Mail.  August  8,  1981. 

'Statistics  Canada,  General  Social  Survey  Analysis  Series:  Health  and  Social  Support,  1985, 

December,  1987,  p.  21. 

'This  statement  was  made  at  a  colloquium  ananged  by  the  Sociology  Department  at  the  University 

of  Alberta  during  the  fall  of  1985. 


Spring   1989 


6  —  lassist   quarterly 


expensive.    Secondly,  and  more  to  the  point  being  made.  Statistics  Canada  had  agreements  with 
provincial  statistics  bureaus  to  provide  them  with  enough  cases  to  conduct  analyses  at  the  provincial 
level.    Thus,  consultation  with  provincial  statistics  bureaus  necessitated  a  large  sample  size.    Here 
cooperative  federalism  appears  to  have  left  its  imprint. 


Does  the  Survey  Get  a  Clean  Bill  of  Health? 

When  social  historians  begin  to  sift  through  the  survey  research  of  the  past  two  decades,  someone 
will  likely  observe  that  the  questions  asked  in  those  studies  reveal  more  about  a  society's  condition 
than  the  answers  given  by  the  respondents.    What  do  the  questions  asked  in  the  1985  GSS  say  about 
Canadian  society? 

Several  of  the  questions  in  the  GSS  address  a  pressing  concern  of  the  1970's:  Is  the  typical  sixty-five 
year  old  Swede  more  fit  than  the  average  thirty-five  year  old  Canadian?    Serious  attention  had  been 
given  to  the  general  fitness  of  Canadians  in  two  earlier  national  surveys,  the  1977  Canada  Health 
Survey  and  the  1981  Canada  Fitness  Survey.    As  baby  boomers  approach  middle  age,  the  concern  is 
growing  that  their  demands  on  public  health  care  will  exceed  the  system's  capacity  to  respond. 
These  surveys  reflect  an  attempt  to  monitor  the  general  fitness  of  society  in  anticipation  of  the  health 
care  system  becoming  overloaded. 

Nearly  half  of  the  168  questions  of  the  GSS  tap  some  aspect  of  general  fitness,  which  is  also 
referred  to  as  health  status.    Many  of  these  items  contribute  to  an  assessment  of  current  physical 
ability  and  activity  (see  List  1.)   Other  items  ask  about  smoking  and  drinking  behaviour,  both  viewed 
as  barriers  to  good  health.'  Three  questions  asked  about  feelings  of  satisfaction  and  happiness,  which 
offer  some  measure  of  emotional  well-being.    In  total,  these  items  provide  an  overall  indication  of 
general  fitness.    (See  List  1  in  appendex  A). 

In  reviewing  the  sections  of  the  survey  dealing  with  health  status,  two  observations  are  critical.    First, 
all  but  a  few  of  the  questions  were  asked  in  behavioural  terms.' 

The  only  exceptions  were  the  items  about  life  satisfaction  and  happiness.    The  survey  is  devoid  of 
attitudinal  questions.    The  case  may  be  that  a  government  statistics  agency  politically  cannot  ask 
attitudinal  questions.    Maybe  the  style  of  questioning  too  stricUy  adhered  to  the  wording  used  in  the 
Canadian  Labour  Force  Survey,  which  concentrates  on  behavioural  items  about  employment  and 
seeking  employment    Whatever  the  reason,  there  is  a  clear  imbalance  in  favor  of  questions  expressed 


'  Two  questions  about  smoking  can  be  direcdy  compared  with  the  1977  Canada  Health  Survey:  "At 
what  age  did  you  start  smoking  cigarettes  daily?"  and  "About  how  many  cigarettes  do  you  smoke 
each  day?"    One  drinking  question  is  identical  between  the  two  studies:  "At  what  age  did  you  start 
drinking  alcoholic  beverages?"    The  discussion  about  smoking  and  drinking  as  barriers  to  health  can 
be  found  in:  Statistics  Canada,  General  Social  Survey  Analysis  Series:  Health  and  Social  Support, 
1985,  December,  1987,  p.  15. 

'  Examples  of  behavioural  wording  include  items  that  stari  with  the  following  phrases:  How  many 
times;  Do  you  have  any  trouble;  Which  did  you  do;  At  what  age  did  you;  About  how  many;  Did 
you  ever;  How  long  have  you;  etc. 

Spring   1989 


iassist   quarterly  —    7 


in  behavioural  terms. 

The  second  observation  is  that  no  social  issues  in  the  area  of  health  were  really  investigated. 
Tangentially,  one  item  came  close  to  touching  upon  a  social  issue  connected  to  smoking.    This 
question  asked,  "How  many  people  in  your  household,  excluding  yourself,  smoke  daily."    Since  1985, 
the  issue  of  second-hand  smoke  has  rapidly  grown  in  significance.    There  are  now  Canadian  airlines 
that  ofTer  smoke-free  flights.    Legislation  is  being  introduced  to  guarantee  smoke-free  workplaces. 
While  the  item  in  the  GSS  did  not  monitor  people's  feelings  about  second-hand  smoke,  it  did 
measure  the  number  of  people  living  in  an  environment  conuining  second-hand  smoke.  ' 

A  predominant  shortcoming  of  the  GSS  is  that  it  missed  some  very  important  health  care  issues.    A 
perusal  of  the  major,  daily  Canadian  newspapers  in  1985  reveals  several  issues  that  should  have  been 
addressed  in  a  general  social  survey  dedicated  to  health  care.    A  discussion  of  some  of  these  issues 
follows. 

The  single,  most  significant  health  care  issue  in  1985  was  the  struggle  by  the  federal  govenmient  to 
ban  the  practice  of  extra-  billing.    Each  province  in  Canada  determines  its  own  fee  schedule  for 
medical  services.    In  turn,  physicians  receive  from  the  provincial  health  insurance  plans  the  amount 
set  in  the  schedule  for  difTerent  items  of  service.  '"  In  the  1970's,  a  large  number  of  physicians 
became  disgruntled  with  the  fee  schedules  and  began  to  charge  their  patients  a  fee  in  excess  of  the 
provincial  health  care  plan.    This  practice  became  known  as  extra-billing. 

In  1984,  the  Canada  Health  Act  was  passed  unanimously  by  the  House  of  Commons  allowing  the 
federal  government  to  deduct  one  dollar  of  medicare  transfer  payments  to  the  provinces  for  each 
dollar  that  a  physician  collected  through  extra  billing.    In  Alberta,  where  the  provincial  government 
permitted  physicians  to  extra-bill,  residents  were  paying  $1.6  million  a  month  because  of  the  etra 
fees  -  $800,000  to  the  physicians  in  extra  fees  and  $800,000  of  provincial  tax  funds  to  fill  the  gap 
created  by  the  penalty  imposed  by  the  federal  government    In  July,  1985,  the  Canadian  Medical 
Association  launched  a  suit  in  the  Supreme  Court  of  Ontario,  alleging  that  the  federal  government 
had  exceeded  its  jurisdiction  by  withholding  medicare  funds  from  provinces  that  allowed  extra-billing 
by  physicians  and  hospital  user  fees.    This  issue  had  been  debated  publicly  throughout  the  year 
leading  up  to  the  GSS  and  was  clearly  salient  in  the  public's  mind,  especially,  since  it  also  arose  as 
an  issue  during  the  federal  election  in  September,  1984. 

The  argument  cannot  be  posed  that  the  extra-billing  issue  defies  measurement  in  a  social  survey. 
For  example,  the  following  question  was  asked  in  the  1980  Edmonton  Area  Study:  "How  many 
doctors  charged  you  a  personal  fee  in  addition  to  the  fee  that  Medicare  paid?"^^  The  wording  of  this 
question  is  even  consistent  with  the  style  used  in  the  GSS.    Several  other  items  have  been  asked  in 
Edmonton  Area  Studies  about  this  issue  (See  List  2  for  further  examples). 


'  The  ISSP  module  for  1985  (described  below)  included  an  item  about  smoking  in  public  spaces:   "Do 

you  think  that  smoking  in  public  spaces  should  be  prohibited  by  law?" 

'"  Not  all  terms  of  health  care  provided  through  the  provincial  health  plans  and  this  itself  has 

emerged  as  an  issue.    Physicians  have  been  arguing  that  the  fee  schedules  are  not  keeping  pace  with 

items  of  service  applying  new  technology,  such  as  laser  therapy  or  certain  cardiac  procedures 

"  The  Edmonton  Area  Studies,  which  have  been  conducted  annually  since  1977  by  the  Population 

Research  Laboratory  at  the  University  of  Alberta,  are  a  series  of  surveys  dealing  with  quality  of  life 

issues. 


Spring   1989 


iassist   quarterly 


There  were  other  issues  about  health  care  in  the  pubhc  eye  in  1985.'^  No  statistical  evidence  was 
provided  to  support  his  assertion,  which  was  something  that  a  general  social  survey  could  have 
provided.    The  Vancouver  Sun  carried  a  story  about  the  problem  of  doctors  being  poorly  distributed 
across  Canada:  "Dr.    William  Vail  said  neither  the  public  nor  the  profession  is  happy  when  there  are 
too  few  doctors.    'They  find  it  difficult  to  get  access  to  a  doctor  while  we  work  our  butts  off"  "  The 
Edmonton  Journal  published  an  article  summarizing  some  of  the  findings  of  a  survey  conducted  by 
the  Alberta  Medical  Association.'*  In  this  survey,  respondents  were  asked  to  identify  medical  issues 
that  they  perceived  to  be  major  problems.    "Long  waiting  lists  to  see  specialists,  problems 
communicating  with  doctors,  and  doctors  in  a  rush  were  often  cited  as  mapr  problems."  '^  One  last 
example  of  a  salient  health  issue  was  reported  in  an  Edmonton  Journal  story  about  the  cost  of 
transplants."  "Health  care  policy-  makers  have  been  inconsistent  in  dealing  with  heart  transplants, 
Roger  Evans  said.    While  reluctant  to  approve  them,  they  proceed  with  other  equally  expensive 
treatment."  " 

In  addition  to  issues  surrounding  the  use  of  health  care,  another  related  concern  that  is  growing  in 
importance  deals  with  impaired  driving.    Since  1985,  brewing  corporations  have  introduced  media 
campaigns  dissuading  drinkers  from  driving.    The  1984  Edmonton  Area  Study  discovered  that  one  in 
four  drivers  acknowledged  that  they  drove  impaired  at  least  once  a  month.    What  is  known  about  the 
incidence  of  this  problem  nationally?    Again,  this  is  an  item  well  fitted  for  a  general  social  survey. 

The  argument  that  a  survey  such  as  the  GSS  simply  cannot  canvas  all  health  issues  goes  without 
saying.    Nevertheless,  the  topics  covered  in  the  1985  GSS  were  far  too  narrow  to  provide  a  snapshot 
of  the  general  disposition  of  Canadians  toward  health  care. 


Concluding  Observations:  The  Cost  of  Being  Different 

Societies  measure  those  aspects  of  life  that  are  valued  the  mosL    In  Western  industrialized  nations, 
this  is  exemplified  by  the  tremendous  efTori  and  expense  that  goes  into  gathering  data  about  the 
performance  of  national  economies,  ranging  from  daily  market  indices  to  annual  measures  of 
production.    While  programs  that  monitor  trends  in  social  issues  will  never  equal  the  volume  of 
economic  data  being  gathered,  recent  cross-national  survey  programs  are  providing  a  wealth  of 
comparative  data  about  life  in  contemporary  societies. 

One  program  in  particular  -  the  International  Social  Survey  Program  (ISSP)"  -  has  been  capitalizing 


'^The  Halifax  Chronicle  Herald  reported  Dr.    Robert  Anderson  claiming  that  trust  in  doctors  was  at 

an  all-time  low. 

'^  The  Vancouver  Sun,  "Distribution  of  doctors  called  national  problem."    May  17,  1985,  p.    A20. 

'*The  Edmonton  Journal,    "few  patients  upset  by  extra-billing,  says  AM  A  survey."    October  10,  1985, 

p.H8. 

''  ibid. 

"The  Edmonton  Journal.    "Cost  of  transplants  called  comparable,"  May  26,  1985,  p.A2. 

"  ibid. 

'*The  ISSP  is  a  -  the  International  consortium  of  social  scientists  from  Australia,  Hungary,  Ireland, 


Spring   1989 


iassist   quarterly  —   9 


on  the  growth  of  national  general  social  surveys.    Each  year  since  1985  the  ISSP  has  coordinated  the 
replication  of  a  set  of  questions  in  participating  countries.    The  annual  topics  of  these  question  sets 
have  been,  respectively,  the  role  of  government,  social  networks  and  social  support,  beliefs  about 
social  inequality,  and  the  family. 

The  data  from  the  ISSP  modules  offer  ideal  opportunities  to  conduct  comparative  analyses  among 
participating  countries.    The  value  of  comparative  data  is  that  it  provides  an  often  needed  reference 
point  to  examine  specific  findings.    The  discovery  in  a  national  survey  that  sixty  percent  of  the 
respondents  approve  of  abortion  on  demand  may  seem  to  make  an  important  statement  about  that 
society.    Nevertheless,  the  percentage  may  be  identical  for  comparable  nations.    Without  a  reference 
point,  the  task  of  distinguishing  meaningful  findings  is  difTiculL 

A  reference  point  can  be  established  using  one  of  two  approaches.    First,  an  item  may  be  compared 
over  time.    For  example,  if  five  years  earlier  the  same  item  about  abortion  had  been  approved  by 
only  theirty  percent  of  the  respondents,  the  large  shift  in  public  sentiment  would  suggest  that  the 
society  had  undergone  some  substantive  change.    The  other  approach  compares  items  across  similar 
countries,  which  is  possible  using  the  ISSP  data.    For  example,  if  in  a  comparable  nation  only  thirty 
percent  support  abortion  on  demand,  an  important  dilTerence  between  those  societies  would  exist 
Thus,  an  analysis  might  focus  upon  factors  that  could  explain  those  differences  for  that  particular 
item. 

The  danger  in  not  having  a  reference  point  lies  in  not  being  able  to  identify  meaningful  findings. 
Gazing  upon  the  national  navel  often  produces  research  myopia.    Without  a  comparative  unit, 
significant  explanatory  variables  can  be  easily  overlooked.    Cross-national  differences  can  highlight 
factors  that  might  otherwise  be  missed  within  the  study  of  a  single  country. 

The  absence  of  Canada's  participation  in  the  ISSP  has  been  unfortunate.    While  Statistics  Canada 
may  cover  some  ISSP  topics  in  its  General  Social  Survey,  the  comparative  nature  of  the  data  will 
undoubtedly  be  limited  or  inapplicable.    For  example,  until  Canada's  General  Social  Survey 
incorporates  items  measuring  attitudes,  the  ISSP  modules  will  have  a  number  of  questions  that  will 
never  be  asked  by  Statistics  Canada.    Furthermore,  the  importance  of  gathering  these  data  within  a 
common  time  frame  can  be  significant  to  mediate  the  possible  influence  of  world  events.    Thus, 
simply  covering  ISSP  topics  without  regard  to  the  survey  year  may  also  diminish  the  cross-national 
comparative  value  of  the  data. 

If  the  mandate  for  the  General  Social  Survey  within  Statistics  Canada  simply  cannot  be  coordinated 
with  the  ISSP,  an  alternative  Canadian  program  should  be  introduced  that  would  incorporate  a 
comprehensive  collection  of  Canadian  social  issues  with  the  ISSP  modules. 


"(cont'd)  Italy,  the  Netherlands,  the  United  Kingdom,  the  United  States,  and  West  Germany. 
Spring   1989 


10  —  iassist   quarterly 


List   1 


Examples  of  Physical  Ability  emd  Activity  Questions 
from  the  General  Social  Survey,  1985t 


32.  Do  you  have  trouble  cutting  your  own  toenails? 

33.  Do  you  have  trouble  using  your  fingers  to  grasp  or  handle? 

34.  Do  you  have  any  trouble  reaching  above  your  head? 

43.  Over  the  last  3  months  which  did  you  do  most  frequently? 

n  Running  or  jogging 

n  Bicycling 

n  Tennis 

n  Exercise  in  a  class  or  at  home 

n  Swimming 

n  Raquetball  or  squash 

n  Other 


i    Statistics  Canada,  General  Social  Survey  Analysis  Series:  Health  and  Social  Support, 
1985,  December,  1987,  Appendix  I. 


Spring   1989 


iassist   quarterly 


-    11 


List  2 


Examples  of  Questions  about  Extra-Billing 
Asked  in  Three  Edmonton  Area  Studies 


Year 


Question  Number  and  Text 


1980   76.  b.  Please  tell  me  how  much  you  agree  or  disagree  with  a 
doctor  charging  you  a  fee  in  addition  to  the  fee  that 
medicare  pays. 

1982   23.  We  have  heard  a  great  deal  lately  about  the  standard  of 

health  care  in  the  province,  how  much  physicians  should  be 
paid,  and  the  method  of  payment.  Have  you  discussed  these 
issues  with  any  of  your  personal  physicians  the  last  six 
months? 

1982   24.   In  your  opinion,  how  satisfied  or  dissatisfied  were  you, 

generally  speaking,  with  the  standard  of  medical  care  your 
physicians  provided  before  the  fee  and  billing  issues 
flared  up  last  fall? 

1982   25.  Please  tell  me  how  much  you  agree  or  disagree  with  a  doctor 
charging  you  a  fee  in  addition  to  the  fee  that  medicare 
pays  the  doctor  (extra  billing)? 

1982   26.   Please  tell  me  how  much  you  agree  or  disagree  with  a  doctor 
making  you  pay  all  the  fee  and  then  getting  some  or  all  of 
the  money  back  from  Medicare  yourself  (direct  billing). 

1984   53.  a.  Please  think  for  a  moment  about  the  physician  or 
physicians  you  have  seen  the  most  often  in  the  last 
year.  If  your  doctor  left  the  government  medicare  plan, 
would  you  still  go  to  see  that  doctor  even  if  it  meant 
having  to  pay  the  doctor  yourself  and  then  applying  to 
medicare  for  the  money  that  you  would  get  back? 

1984   53.  b.  What  if  you  had  to  see  a  new  doctor,  not  any  of  the  ones 
you  have  been  seeing?  If  you  had  a  choice  between  a 
physician  who  practiced  inside  the  government  system  auid  a 
physician  who  practiced  outside  that  plan,  all  other  things 
being  equal,  would  you  choose  the  physician  who  practiced 
outside  the  plan? 


Spring   1989 


12  - 


iassisl   quarterly 


The  Use  of 

Microcomputers 

for  Demographic 

Analysis: 

An  Overview  of 

Options  for  the 

Novice  User 


by  Diane  Crispell' 

Associate  Editor 

American  Demographics  magazine 

P.O.    Box  68 

Ithaca,  NfY  14851 


Introduction 

As  the  microcomputer  editor  and  research 
associate  of  American  Demographics,  I  review 
demographic-related  software  and  answer 
research  question.    People  often  ask  us  about 
software  programs  and  data  sources.    I  will  be 
focusing  on  demographic  and  socioeconomic 
information  since  that  is  my  forte,  but  the  same 
observations  apply  to  any  social  science  data. 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technology  (lASSIST)  Conference  held  in 
Washington,  D.C.,  May  26-29.  1988 


To  give  you  a  little  background  on  my 
company,  American  Demographics  publishes  a 
monthly  magazine  about  demographic  and 
consumer  trends.    Our  subscribers  are  primarily 
business  people  who  are  learning  how  to  use 
demographic  information.    They  are  not 
academic  demographers,  but  they  have  a  need 
to  analyze  demographic  data.    At  the  same  time, 
the  use  of  microcomputers  has  become  nearly 
universal  in  business.    A  natural  connection  is 
the  use  of  microcomputers  for  demographic 
analysis,  which  is  what  my  monthly  column  in 
the  magazine  is  all  abouL 

The  demographic  data  industry,  an  industry  that 
followed  in  the  wake  of  the  1980  Census  of 
Population,  has  been  a  forerunner  in  the  release 
of  desktop  systems  for  demographic  analysis. 
You  may  or  may  not  be  familiar  with  some  of 
these  firms  -  CACI,  Claritas,  Donnelley 
Marketing,  Market  Statistics,  and  National 
Planning  Data  Corporation,  to  name  a  few. 
National  Decision  Systems  came  out  with  the 
first  major  stand-alone  PC  system  in  1985  -  the 
Infomark  laser  disk  system.    Since  then,  PC 
applications  have  become  increasingly 
sophisticated.    Most  major  zendors  now  offer 
desktop  systems  with  huge  databases  of 
population  estimates  and  projections,  mapping 
applications,  retail  sales  estimates, 
geodemographic  segmentation  systems,  and  so 
on. 

For  the  most  part,  these  systems  are  for 
retrieval  and  display  of  data  rather  than  for  true 
data  analysis.    Furthermore,  they  are  often 
complex  to  learn  and  require  the  absolute  latest 
in  hardware,  such  as  CD  readers,  20x20 
Bernoulli  boxes,  and  massive  amoimts  of 
memory.    And  finally,  the  data  used  with  these 
systems  are  proprietary  and  very  expensive. 

So  what  about  the  novice  or  occasional  user? 
What  are  the  alternatives?    You  can  do 
demographic  analysis  on  a  microcomputer 
without  all  the  bells  and  whisdes. 


Spring   1989 


iassist  quarterly 


-   13 


Spreadsheet  Tactics 

If  you're  a  number-cruncher  of  any  kind,  one 
of  the  First  things  you  leam  to  do  with  a 
microcomputer  is  to  use  a  spreadsheet 
Whether  it  be  Lotus  1-2-3  or  any  of  the  many 
similar  programs,  setting  up  rows  and  columns 
of  numbers  and  doing  calculations  on  them  is  a 
basic  and  a  musL 

But  many  of  us  don't  get  much  beqond  the 
basics.    And  we  may  not  realize  how  we  can 
use  spreadsheets  to  calculate  complex  statistics 
with  survey  data,  or  to  generate  forecasts.    Or 
we  may  feel  that  we  can't  program  the 
spreadsheet  to  perform  these  functions. 

There  is  help.    Walonick  Associates  sells  a 
series  of  StatPackets,  small  programs  that  do 
statistical  analyses  using  spreadsheet  data  files. 
Each  StatPacket  does  a  specific  statistical 
procedure,  such  as  descriptive  statistics, 
crosstabulation,  or  multiple  regression.    You  set 
up  your  spreadsheet  file  in  the  correct  format 
(cases  in  rows,  variables  in  columns)  and  save 
it,  then  run  StatPackets.    Select  the  procedure, 
spreadsheet  file,  some  analysis  options,  and  ofT 
you  go.    This  can  be  a  sensible  alternative  to 
using  a  full-blown  and  complex  statistical 
package  like  Walonick's  StatPac  Gold.    You  can 
also  produce  rough  graphics  from  spreadsheet 
programs,  as  an  alternative  to  learning  a  whole 
separate  graphics  program.    Some  of  the  new 
spreadsheets  like  Borland's  Quattro  have  pretty 
nice  graphics,  better  than  Lotus  1-2-3  anyway, 
and  they're  cheaper,  too. 

You  can  use  spreadsheets  for  analysis  of  data 
other  than  survey  data,  too.    We're  holding  a 
series  of  microcomputer  workshops  this  year, 
and  one  of  the  exercises  for  attendees  is  to 
produce  population  projections  with  a 
spreadsheet    We're  providing  them  with 
templates  and  some  choices  of  datasets,  so  they 
can  see  the  assumptions  and  formulas  involved 


in  doing  this  kind  of  forecasting.    One  template 
I've  used  for  a  long  time  is  one  that  calculates 
the  percentages  needed  to  produce  a  pyramid  of 
the  population  by  sex  and  age  groups. 


Bigger  Packages 

Of  course,  you  can't  do  a  whole  lot  of  raw  data 
processing  with  just  a  spreadsheet  program. 
Sooner  or  later,  you  have  to  get  involved  with  a 
survey  or  statistical  analysis  program.    The 
spectrum  of  software  available  is  broad. 

Where  you  start  with  microcomputer  data 
processing  depends,  first  and  foremost,  on  how 
you  get  your  data.    Do  you  collect  them 
yourself?    If  you  do,  is  it  through  telephone  or 
personal  interviews?    Mail-in  questionnaires?    If 
you're  using  secondary  data,  what  format  are 
they  in?    Are  they  on  tapes,  floppy  disks?    How 
do  you  get  the  data  onto  your  microcomputer 
anyway? 

In  my  research  at  American  Demographics,  I've 
done  a  little  bit  of  everything  somewhere  along 
the  line.    We  don't  do  a  lot  of  primary  research 
ourselves,  but  we  do  conduct  reader  surveys 
occasionally,  in  the  form  of  mail-in 
questionnaires.    I've  analyzed  several  of  these 
using  survey  tabulation  software. 

Survey  Packages 

These  programs  vary  from  small  and  simple  to 
vast  and  complex.    At  the  lower  end,  you  will 
find  programs  like  Henry  Elkins  &  Associates' 
SurveyMate,  which  Dr.    Elkins  developed 
primarily  for  his  own  use  and  that  of  other 
social  scientists  out  in  the  field.    It  is  geared 
towards  easy  data  entry  and  it  is  inexpensive 
($145),  but  this  doesn't  mean  it's  a  small 
program.    SurveyMate  can  handle  up  to  1,000 
data  fields  and  32,000  cases,  ample  for  many 


Spring   1989 


14  - 


iassist   quarterly 


purposes.    It  doesn't  do  every  kind  of  statistical 
analysis,  but  it  does  frequencies,  crosstabulation, 
and  multiple  regression,  and  you  can  export  the 
data  to  ASCII  files  for  presentation  purposes  or 
for  further  analysis.    When  I  talked  to  Dr. 
Elkins  recently,  he  told  me  the  program  is  now 
used  in  42  countries  -  by  academics, 
government  agencies,  and  market  research  firms 
-  in  places  like  Nigeria,  Singapore,  and 
Papua-New  Guinea. 

I've  looked  at  a  few  other  survey  programs,  like 
SurveyTab,  UNCLE,  and  the  Survey  System. 
These  more  costly  commercial  programs  tend  to 
produce  nicer  looking  tables  and  may  ofTer 
more  user-friendly  features,  but  I  think  that  you 
should  choose  survey  software  on  the  basis  of 
what  you  need.    I  have  a  few  guides  as  to  what 
questions  to  ask  about  survey  software: 

1.  capacity  -  This  is  pretty  basic.    Can  the 
program  handle  the  number  of  cases  and 
variables  you  have? 

2.  type  of  data  entry  -  would  you  rather  enter 
the  data  field  by  field  on  a  screen  displaying 
the  questions,  or  would  you  rather  enter  data 
in  an  80-column  card  format,  just  a  string  of 
digits  across  the  screen?    The  former  method 
is  good  if  you  don't  have  huge  amounts  of 
data  and  less  experienced  people  doing  the 
entry.    The  latter  is  faster,  but  takes  more 
concentration  and  expertise.    And  most 
programs  seem  to  do  it  one  way  or  the 
other,  so  you  should  decide  before  you  buy. 

3.  recoding  and  transformation-  how  flexible  is 
the  software  in  terms  of  regrouping  data  into 
ranges  (like  age  groups)  or  other 
specifications,  such  as  missing  values.    How 
does  the  program  deal  with 
multiple-response,  open-ended  questions,  and 
other  tricky  survey  data  quirks? 

4.  analysis  -  can  the  program  perform  all  of 
the  procedures  you  need?  If  it  can't,  how 
easy  is  it  to  transfer  the  data  to  another 


program  that  can,  and  do  you  want  to  get 
involved  with  this? 

Statistical  Analysis 

In  many  cases,  survey  software  isn't  enough. 
You  may  need  more  complex  statistical 
procedures  such  as  analysis  of  variance.    Or 
you  may  want  a  link  data  analyses  to 
graphics  or  mapping  programs  for 
presentation  purposes.    Or  you  may  not  need 
a  program  to  enter  raw  data,  but  already 
have  it  on  a  tape  or  floppy  disks.    In  this 
case,  you  might  need  a  full-blown  statistical 
package. 

There  are  a  lot  of  these  around,  too.    A  lot 
of  people  have  called  and  asked  me,  "What 
package  should  I  buy?"    I  have  no  pat 
answer,  it  depends  so  much  on  what  you're 
doing.    As  with  survey  software,  when 
choosing  a  stat  program,  check  its  data 
capacity,  ease  of  use,  and  analysis  options. 
The  big  programs  are  pricier,  but  of  course 
they  do  more.    Also  look  out  for  additional 
"modules,"  which  a  number  of  programs  tack 
on,  at  extra  cosL    For  example,  SPSS-PC+  is 
handy  for  those  used  to  the  mainframe 
version,  and  the  learning  curve  is  smaller, 
but  if  you  want  advanced  statistics,  that's 
another  few  disks  (and  afew  hundred 
dollars).    The  same  goes  for  nicely  formatted 
tables,  and  user-defined  data  entry  modules, 
as  well  as  separate  graphics  and  mapping 
programs.    It's  sort  of  unavoidable  - 
statistical  analysis  programs  take  up  a  lot  of 
computer  space,  and  it  is  nice  to  be  able  to 
buy  only  the  parts  you  need. 


Spring   1989 


iassist   quarterly 


-    15 


Demographic  Data  and  Software  Options 

I'm  not  going  to  go  into  more  detail  about 
these  kinds  of  analysis  programs.    Many  people 
don't  need  to  analyze  raw  data.    They  want  to 
retrieve  and  manipulate,  in  relatively  simple 
ways,  existing  data. 

The  anay  of  demographic  data  and  software  for 
microcomputers  is  enormous.    Sources  range 
from  government  agencies  to  private  vendors,  as 
I  mentioned  earlier.    Data  media  range  from 
datasets  on  floppy  disks  to  massive  databases  on 
optical  disks.    Software  capabilities  range  from 
simple  retrieval  to  thematic  mapping. 

Government  Sources 

I'm  going  to  give  a  brief  overview  of 
government  sources  of  demographic  data  for 
microcomputers.    Several  federal  agencies  offer 
data  online  and  on  floppy  disks.    Stu  Weisman 
will  be  telling  you  more  about  government  data 
available  on  floppy  disks  through  NTIS  later  in 
this  session. 

The  Bureau  of  Labor  Statistics  currently  offers 
data  on  diskettes  and  online  through  an 
electronic  news  service.    The  data  diskettes  are 
formatted  for  use  with  Lotus  1-2-3  spreadsheets 
and  cover  topics  like  Consumer  Price  Indices  for 
104  items  and  54  U.S.  cities,  useful  for 
comparing  cost  of  living.    Another  series  has 
monthly  and  annual  average  labor  force  data  by 
age,  sex,  and  race  for  current  year  and  3  prior 
years.    The  online  news  release  service  lets  you 
download  BLS  press  releases  (about  100  a  year, 
which  often  include  data  tables)  for  a  minimal 
fee. 

The  Census  Bureau  has  an  online  service,  too, 
called  CENDATA.    For  those  of  you  who  aren't 
familiar  with  this  database,  it  is  offered  through 
DIALOG  Information  Services,  one  of  the 
largest  online  vendors.    CENDATA  offers  a 


small  amount  of  the  Bureau's  voluminous  data 
holdings,  including  many  of  their  publications 
such  as  press  releases,  the  Monthly  Product 
Announcement  newsletter,  and  portions  of  Data 
User  News.    It  also  provides  other  population 
and  economic  data  from  the  Bureau. 

The  Census  Bureau  offers  some  data  on  floppy 
disks,  too.    I  checked  with  the  Bureau  last  week 
to  get  an  update  on  what's  available.    They've 
had  the  1986  State  and  Metropolitan  Area  Data 
Book  in  disk  form  for  some  time,  as  well  as  the 
1983  County  and  City  Data  Book.    I  was  told 
that  the  1987  County  and  City  Data  Book  is  on 
the  way,  as  are  1986  population  estimates  and 
1985  per  capita  income  estimates  for 
governmental  units  (this  was  announced  as 
already  being  available  through  CENDATA,  but 
apparently  there  was  some  problem  with  the 
data).    The  data  come  in  a  format  that  you  can 
load  onto  spreadsheets  and  other  programs  on 
IBM-type  PCs.    When  you  buy  disks  from  the 
Bureau,  they  send  you  a  little  utility  program 
that  basically  lets  you  view  the  data  in  a  table 
format,  but  little  else. 

The  Bureau  will  also  download  any  data  you 
want  on  to  floppy  disks  on  a  custom  basis,  but 
of  course  this  costs  more  and  probably  has  a 
much  longer  turnaround  time  than  if  you 
purchase  standard  offerings. 

Government  data  -  whether  you  download  them 
from  an  online  system  or  purchase  disks  -  tend 
to  come  in  the  form  of  printed  tables  -  in 
other  words,  they're  designed  to  print  out  and 
look  at,  rather  than  do  any  analysis  with.    It 
takes  a  little  work  to  get  these  data  into  a 
spreadsheet  format  to  work  with,  but  it  can  be 
done.    And  if  you  use  a  Macintosh  rather  than 
an  IBM,  there  are  ways  to  transfer  data  for  use 
on  the  Mac.    If  anyone  has  any  questions  about 
this,  ask  me  afterwards. 

Private  Sources  of  Demographic  Data 


Spring   1989 


16  - 


iassist   quarterly 


To  get  more  usable  data,  you  often  have  to  go 
to  private  sources.    There  are  a  number  of 
firms  that  ofTer  demographic  data  for 
microcomputers  on  floppy  disks,  often  with  the 
software  to  retrieve  and  analyze  them.    As  I 
mentioned  earlier,  some  of  these  systems  are 
enormous  and  costly,  incorporating  massive 
databases  and  the  latest  in  hi-tech  hardware. 
But  some  are  lower-end,  for  use  on  simpler 
PCs. 

A  good  example  of  a  self-contained 
demographic  data/analysis  package  is  Your 
Marketing  Consultant  from  Market  Statistics  in 
New  York.    Market  Statistics  produces  the 
Survey  of  Buying  Power  data  that  appear  in 
Sales  &  Marketing  Management  each  year.    For 
those  of  you  not  familiar  with  this,  the  Survey 
of  Buying  Power  is  not  actually  a  survey  -  it  is 
a  database  of  Market  Statistics'  proprietary 
estimates  and  projections  of  population,  income, 
and  retail  sales  for  states,  metropolitan  areas, 
and  counties.    The  YMC  Advanced  Consumer 
program  comes  with  data  for  all  of  these 
geographic  levels,  as  well  as  for  ADI  and  DMA 
market  areas  (defined  by  Arbitron  Ratings  and 
A.C.    Nielsen  for  the  media  industry).    The 
company  also  has  a  business-to-business  version 
of  the  program.    Your  handout  includes  a  copy 
of  a  column  I  wrote  for  American 
Demographics  explaining  the  capabilities  of  the 
earlier  version  of  this  software  -  it  has  been 
enhanced  since  then  (I  believe  the  price  has 
gone  up  slighdy,  too).    Basically,  the  software 
can  search  for  areas  of  a  given  geographic  level 
based  on  demographic  and  economic  criteria 
that  you  select  -  for  example,  you  may  want  to 
find  out  which  counties  in  the  South  have  a 
black  population  of  at  least  50,000  and  then 
rank  them  by  median  income.    You  can  do  this 
easily  with  YMC. 

Even  some  hi-tech  products  are  not  expensive 
or  hard  to  learn.    Slater  Hall  Information 
Products  of  Washington  offers  a  number  of 
government  databases  on  CD-ROM  -  the  1982 
Census  of  Agriculture  down  to  the  county  level. 


a  county  statistics  database  with  demographic 
and  economic  data,  several  business/economic 
databases,  and  population  statistics  including  the 
entire  STF-3C  computer  tape.    SHIP  sells  the 
CD  databases  with  their  software,  which  is 
called  Searcher.    With  Searcher,  you  can 
retrieve  data  by  geography  or  numeric  criteria, 
view  them  as  tables  and  export  them  to  Lotus 
or  other  programs  for  further  use.    Your 
handout  includes  a  review  of  this  program,  too. 

There  are  a  number  of  other  firms  offering 
demographic  and  economic  data  on  floppy  disks, 
too.    I've  given  you  a  listing  of  those  we're 
aware  of    Some  are  specialized,  like  Analysis 
and  Forecasting  in  Cambridge,  which  offers  IRS 
migration  data  -  state-by-state  and  in-  and  out 
flows  over  time.    Or  the  Center  for  Continuing 
Study  of  the  California  Economy  in  Palo  Alto, 
which  offers  population  and  employment 
estimates  and  projections  for  the  state  of 
California  and  its  counties  and  cities. 

Other  companies  offer  a  broader  variety  of  data 
-  such  as  CACI  of  Fairfax.    CACI  is  one  of 
what  we  call  the  "full-service"  demographic 
data  companies.    Their  data  include  1980  census, 
cunent-year,  and  five-year  projections  of  the 
population  by  age  and  sex  for  virtually  any 
geographic  area  in  the  U.S.    Other  databases 
include  business  and  retail  information,  and  the 
ACORN  geodemographic  segmentation  system. 
(ACORN  is  an  acronym  for  A  Classification  of 
Residential  Neighborhoods).    These  big 
companies  also  ofTer  entire  databases  online  or 
with  Bernoulli  and  CD  technology,  including  the 
software  to  analyze  them.    Smaller  companies 
typically  only  sell  the  data  -  you  usually  have 
to  figure  out  how  to  use  them. 

Other  Software 

The  list  I've  given  you  of  data  products  doesn't 
include  software  to  analyze  data  -  survey, 
statistical,  graphics,  and  mapping  software.    1 
mentioned  the  survey  and  statistical  software 
earlier.    Displaying  data  with  charts  or  maps  is 


Spring   1989 


iassist   quarterly 


-    17 


another  important  element  of  demographic 
analysis,  and  there  are  a  number  of  these 
programs  available.    Some  well-known  business 
graphics  programs  include  Harvard  Graphics 
from  Software  Publishing,  Graph  Writer  and 
Freelance  from  Lotus  Development,  and 
ChartMaster  from  Ashton-Tate.    With  these 
programs,  you  can  create  bar,  line,  pie,  and 
other  types  of  charts  directly  from  data  series. 
These  programs  offer  many  features  in  terms  of 
labeling  and  customizing  chart  formats,  and 
allow  users  to  create  very  efTective  presentations 
of  demographic  information. 

While  you  can  graph  any  kind  of  quantitative 
data,  mapping  is  a  more  specialized  task  -  the 
display  of  geographic  relationships.    Since 
demographic  information  is  often  analyzed  in 
terms  of  geography,  demographic  data  and 
mapping  are  a  natural  combination.    And,  in 
fact,  mapping  software  grew  out  of  the 
demographic  industry  in  the  early  1980s.    People 
had  been  mapping  data  for  a  long  time,  of 
course,  but  it  was  done  laboriously  by  hand  or 
on  mainframe  computers.    Desktop  mapping  was 
not  a  reality  until  Sammamish  Data  Systems 
introduced  DIDS  -  Desktop  Information  Display 
System  -  in  1983,  along  with  demographic  data, 
of  course.    Since  then,  a  number  of  mapping 
programs  have  come  on  the  market,  including 
MapMaster  from  Ashton-Tate,  Adas'Graphics 
from  Strategic  Locations  Planning,  and  MAXpc 
Mapping  Analyst  from  National  Planning  Data 
Corporation.    There  are  even  a  couple  of  micro 
mapping  programs  available  for  the  Macintosh 
now. 


question  is  what  data  you  need,  and  why,  which 
is  something  you  have  to  determine  for  yourself 
before  you  go  shopping  for  demographic  data, 
n 


Conclusion 

As  you  can  see,  there  are  a  lot  of  options  for 
demographic  analysis  in  the  desktop  world.    It's 
a  matter  of  knowing  which  data  are  available 
and  in  what  format    Of  course,  the  real 


Spring   1989 


18  - 


iassist   quarterly 


The  Case  for 

Software  as 

Documentation 


scientific  analysis  that  are  today  embodied  in 
software.    Second,  the  meaning  of  much 
unobstrusively  gathered  data  about  our  society  is 
revealed  only  through  analysis  of  the  software 
systems  that  contributed  to  its  collection.    And 
thirdly,  the  potential  information  value  of  social 
science  data  is  captive  of  the  technologies  used 
to  analyse  it,  and  it  can  only  be  understood  as 
an  historical  agent  within  the  actual  context  of 
its  use;  e.g.,  as  software  of  its  day  permitted  its 
utilization.    For  these  reasons  it  is  suggested 
that  software  must  itself  become  a  focus  of 
collecting  activity  in  social  science  archives  and 
the  implications  of  the  requirement  are 
explored. 


by  David  Bearman' 

Archives  &  Museum  Informatices 

5600  Northumberland  Street 

Pittsburg.  PA  15217 

(412)  421-4638 


Abstract 

A  study  of  the  potential  value  of  software  as 
historical  evidence,  conducted  by  the  author  in 
1987  [published  as  "Collecting  Software:  A  New 
Challenge  for  Archives  &  Museums,  Archival 
Informatics  Technical  Report,  vol.  1  #2, 
Summer  1987] ,  raised  a  number  of  critical 
issues  for  social  science  data  archives.    First,  a 
number  of  social  control  functions  that  would 
traditionally  have  been  the  subject  of  social 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technology  (IASSIST)  Conference  held  in 
Washington,  D.C.,  U.S.A.  on  May  26-29,  1988 


Introduction 

Last  year  I  was  contracted  by  the  Computer 
Museum  in  Boston  to  examine  die  viability  of 
collecting  software  as  a  museum  objective. 
Previously,  I  had  given  little  thought  to  software 
per  se,  and  like  most  archivists  had  considered 
data  to  be  the  natural  holding  of  a 
machine-readable  archive.    Since  then,  however, 
I  have  come  to  regard  social  science  data 
archives  artifactual  because  they  have  failed  to 
collect  software  and  its  documentation.    I  have 
been  forced  to  recognize,  that  in  important  ways 
the  "facts"  we  possess  about  our  contemporay 
world  are  created  by  software  and  the 
assumptions  built  into  instruction  sets,  and  that 
social  scientists  cannot,  therefore,  be  taken 
seriously  until  they  fully  master  the  systems  that 
generate  their  data.    It  is  increasingly  clear  that 
much  of  what  is  esstinaial  in  the  study  of 
modem  economics,  political  science,  and 
sociology  is  the  product  of  software  that  we 
cannot  afford  to  know  only  from  its  own 
account  of  itself. 

The  basis  for  my  persepctive  is  documented  in 
a  recent  issue  of  Archival  Informatics  Technical 


Spring   1989 


iassist   quarterly 


-    19 


Report'  One  finding  of  that  study  will  come  as 
no  surprise  to  data  archivists;  they  are  not  alone 
in  failing  to  collect  software  as  documentary 
evidence  (a  few  libraries  are  collecting  it  as  a 
user  requested  item,  but  no  archives  collect 
software  as  documentation).    Secondly,  like  most 
other  archives  and  museum  managers,  data 
archivists  find  the  concept  of  collecting  software 
daunting  because  they  usually  assume  that  it 
must  be  retained  in  its  original  form  and  be 
"run"  on  the  machine  for  which  it  was  written. 
Neither  assumption  turns  out  to  be  valid,  as 
very  little  can  be  learned  about  software  code 
by  executing  iL    Thirdly,  we  haven't  got  very- 
good  methods  of  classifying  software,  and  the 
commercialization  of  some  standard  software 
functions  has  not  led  to  a  reduction  in  the 
number  of  applications  developed.    Finally,  the 
history  of  software  is  already  sufficiently 
obscure  that  it  would  be  impossible  to  find  the 
actual  algorithms  (as  opposed  to  the  formulas 
developed  by  the  economists  in  charge)  used  in 
the  calculation  of  the  Commerce  Department's 
monthly  cost  of  living  index,  an  index  which 
affects  virtually  every  aspect  of  American 
economic  life,  or  the  calculation  used  to 
determine  welfare  benefits  in  any  of  the  fifty 
states  in  1976. 

Several  recent  contracts  in  which  I  have  been 
involved  have  extended  the  findings  I  reported 
in  the  summer  of  1987.    In  documenting  the 
events  following  the  takeover  of  the  U.S. 
Embassy  in  Tehran  in  1983,  the  importance  of 
software,  in  the  form  of  scenarios  in  guiding  the 
rescue  mission,  has  become  clear.^  In 


■Bearman,  David;  "Collecting  Software:  A  New 
Challenge  for  Archives  and  Museums",  Archival 
Informatics  Technical  Report  vol.  1,  #2, 
Summer  19gT 

'The  National  Security  Archive,  a  non-partisan, 
non-profit  organization  devoted  to  documenting 
contemporary  U.S.  foreign  and  security  policy,  is 
compiling  the  record  of  these  events.    The 
assumption  built  into  software  used  in  desen 
rescue  "games"  is  a  critical  element  of  the 
record. 


documenting  United  Nations  policy  making,  the 
role  of  software  in  decision  support  systems  and 
mail  networks  has  begun  to  concern  records 
managers."  In  documenting  business  strategies  of 
the  largest  corporation  in  the  United  States,  the 
role  of  software  embodied  in  electronic 
switching  circuits  has  been  one  focus  of  a 
monopolistic  practices  suit' 

Early  in  1987,  Alan  Kowlowitz,  a  records  analyst 
with  the  New  York  State  Archives,  was  assigned 
to  appraise  the  records  of  the  New  York  State 
Division  of  Criminal  Justice  Services  including 
two  integrated,  online,  information  systems:  the 
Computerized  Criminal  History  System  (CCH) 
and  the  Offender  Based  Transaction  Statistics 
System  (OBTS).'  Like  some  other  observers,' 
Kowlowitz  reported  that  the  changing  nature  of 
documentation  practices  in  organizations 
employing  electronic  information  systems  is 
posing  more  fundamental  appraisal  problems  for 
archivists  than  the  appearance  of  traditional 
records  in  electronic  media  formats  up  until  the 
past  few  years  ever  did.    The  guidance 


"The  United  Nations  Administrative 
Coordination  Committee  Information  Systems 
Group  (ACCIS),  has  hired  me  to  develop  policy 
guidelines  for  management  of  electronic  records 
as  a  refiection  of  this  concern. 
'AT&T  is  not  of  course,  the  otJy  firm  charged 
with  using  software  in  this  way  -  American 
Airiines  admitted  its  Sabre  reservation  system 
was  biased  in  its  favor  and  agreed  to  desist 
Any  reader  of  business  journals  knows  that 
many  firms  are  using  software  as  an  conscious 
element  of  business  tactics. 
'Kowlowitz,  Alan;  "Hands  up,  you're  under 
anest:  Appraising  Criminal  History  Data  in  the 
Age  of  the  Electronic  Case  File",  paper 
prepared  for  the  SAA  meeting,  6  September 
1987;  "Appraisal  Study  of  the  Computerized 
Criminal  History  System  (CCH)  and  the 
Offender  Based  Transaction  Statistics  Systems 
(OBTS)  of  the  Division  of  Criminal  Justice 
History",  New  York  State  Archives,  Albany, 
NY,  1987. 

'Aronsson,  Patricia  and  Brown,  Tom; 
"Government  Archivists  and  Government 
Automation:  The  Odd  Couple",  Govenmient 
Publications  Review,  13,  1986,  p.561-570. 


Spring    1989 


20  - 


iassist   quarterly 


developed  for  procedural,  centralized, 
information  management  systems  designed  to 
serve  a  single  organization  is  found  wanting  for 
non-procedural,  decentralized, 
inter-organizational  databases.    Kowlowitz'  study 
of  New  York's  CCH  &  OBTS  illustrates  the 
nature  of  these  challenges. 

The  New  York  CCH  was  created,  along  with 
similar  systems  in  other  states,  by  grants  from 
the  Law  Enforcement  Assistance  Administration 
starting  in  1966  to  improve  the 
inter-jurisdictional  flow  of  information  in 
support  of  more  effective  administration  of 
criminal  justice.    Offender-based  Transaction 
Systems,  OBTS's,  were  introduced  in  1972. 
Both  represented  a  significant  departure  from 
agency  based  and  function  based  information 
systems,  but  one  that  is  increasingly  common  in 
other  domains  as  well  as  criminal  justice.    Both 
are  also  part  of  a  larger  conceptual  (although 
not  fully  implemented)  network  emanating  from 
the  FBI's  National  Crime  Information  Center 
(NCIC)  which  maintains  an  "Inter-state 
Identification  Index"  running  on  its  National 
Law  Enforcement  Teletype  System  Network. 
Each  is  fed  by  agencies  and  events  at  a  local 
level  -  local  police  forces,  district  attorneys, 
court  hearings,  parole  boards  and  the  like. 

In  this  system,  one  agency  (local  police)  records 
information  about  an  arrest,  and  others  (DA's, 
Courts,  Corrections,  Parole)  add  information 
about  the  offense  and  the  offender,  disposition 
of  the  case  etc.    Various  agencies  are  entitled  to 
see  and  use  different  information  in  the  online 
system  in  conjunction  with  their  daily  work.    All 
CCH/OBTS  data  is  considered  confidential  and 
access  is  restricted,  but  some  data  is  sealed 
either  by  court  order  or  laws  pertaining  to 
juvenile  offenders.    From  time  to  time  data  may 
be  purged  from  the  system  either  in  order  to 
make  room  in  the  computer  or  to  comply  with 
a  court  order.    The  DBMS  itself  is  quite 
complex,  but  it  contains  little  unique 
information  and  is  not  as  complete  in  its  parts 
as  information  resident  in  the  contributing 


agencies.    The  database  contains  few  clues  as  to 
its  use;  query  audit  trails  are  not  provided  and 
data  entry  audit  trails  disappear  with 
transactions.    The  system  is  linked  dynamically 
with  PROBAMIS  (the  Probation  Management 
Information  System)  and  PARMIS  (the  Parole 
Management  Information  System),  but  not  all 
information  is  passed  along  in  a  timely  fashion 
and  some  stages  in  many  cases  are  missing. 
Doubtless  some  cases  of  mistaken  identify  are 
recorded  and  some  national  searches  in  the 
NCIC/FBI  system  result  in  false  drops.    Data 
may  also  be  sealed  by  court  order  (indeed  25% 
of  all  cases  are)  and  be  unavilable  when  a 
subsequent  record  is  made,  leading  to  data 
redundancy  and  duplicate  (or  near  duplicate) 
records.    Needless  to  say,  the  database  is  not 
entirely  clean  and  not  completely  trustworthy. 


Data  As  Artifact:  Comments 

My  reasons  for  introducing  the  Kowlowitz  study 
are  to  discuss  the  data  wehave  been  accustomed 
to  retaining  from  such  databases  and  what 
information  we  as  social  scientists  should  be 
studying  about  such  systems. 

The  first  point  we  can  safely  make  about  such  a 
database  is  that  if  the  data  from  it  can  be  saved 
in  a  machine  readable  form,  respecting  privacy, 
it  could  be  a  valuable  source  for  a  broad  range 
of  sociological,  political,  economic  and  other 
studies.    And  it  will  doubtless  be  used. 

The  next,  equally  obvious  point  is  that  data 
obtained  from  such  active  governmental  records 
series  is  problematic  because  its  collection  is  not 
controlled  well.    Already  we  are  aware  from 
Kowlowitz  that  some  police  agencies  are  too 
small  to  report  in  a  timely  fashion  and  some  do 
not  report  at  all.    Some  courts  are  more  regular 


Spring   1989 


iassist   quarterly 


-   21 


than  others  and  the  actions  of  the  Corrections 
Department  occasionally  go  unreported.    We 
may  hope  that  the  causes  of  bias  in  the  data 
are  non-systematic,  but  are  they?    For  reasons 
of  scholarly  integrity,  we  will  want  to  determine 
establish  any  sources  of  artifact  in  this  database 
before  using  it,  how  would  we  begin? 

If  these  were  manual  files  we  would  ask  about 
the  training  of  the  clerks,  the  procedures  they 
used  to  determine  whether  or  not  a  record  of 
arrest  was  for  someone  already  represented  in 
the  database,  what  information  was  available  to 
the  arresdng  officer  (DA,  judge,  parole  officer, 
etc.)  and  how  it  was  used,  and  how  often  each 
office  made  what  additions/corrections  to  die 
records.    To  ask  these  questions  in  die 
electronic  environment  is  to  ask  about  how  die 
software  operating  the  system  was  programmed. 

Let  us  consider  a  more  complicated  social  use 
of  the  data,  in  which  the  focus  is  on  the 
individual  case.    A  victim  sues  the  state  because 
his  attacker  was  prematurely  released  from 
prison  due  to  inaccurate  data  in  the  system. 
The  case  comes  to  court  and  die  data  in  the 
system  is  correcL    We  must  establish  whether 
the  data  was  conect  when  the  parole  hearing 
took  place  and  why  the  parole  board  did  not 
consider  the  damaging  evidence.    Are  they 
negligent,  or  was  the  system  incorrecdy 
designed?    We  cannot  know  without  studying 
the  software.    Only  the  programming  code  will 
tell  us  whether  the  system  withheld  restricted 
data  from  the  parole  board,  failed  to  find  the 
data  due  to  error  on  the  part  of  the  parole 
board  or  on  the  part  of  the  system  designers,  or 
didn't  contain  informauon  it  now  includes,  and 
if  so  how  it  can  be  established  when  it  was 
added.    Data  from  such  systems  is  as  mute 
about  what  it  means,  and  what  is  missing,  as 
data  from  an  experiment  without  protocols. 


Software  and  Organizations 

Software  is  being  developed  by  organizations  to 
respond  to  the  conditions  in  which  they  find 
tiiemselves.    Whedier  it  is  designed  to  calculate 
buying  or  selling  points  for  stocks,  or  best 
routes  for  air  travel,  it  embodies  the 
assumptions  of  an  organization  and  contrains  the 
way  tiiey  subsequendy  do  business.    These 
assumptions  and  contraints  may  backfire,  as  the 
collapse  of  some  Wall  Street  brokerage  firms  in 
October  1987  appears  to  demonstrate  (and  in 
this  I  am  following  die  Rogers??    Report).    On 
the  odier  hand,  American  Airlines  appears  to 
have  demonstrated  in  two  decades  of  tweaking 
its  Sabre  reservation  systems,  it  can  also  make 
for  success.    But  how  will  we  understand  these 
organizations,  and  their  strategies,  in  the  future 
without  access  to  software  documenting  just  how 
diey  operated?    Will  a  future  historian  be  able 
to  write  a  contemporary  equivalent  to  Alfred 
Chandler's  classic  Strategy  and  Structure' 
without  software  to  study? 

As  more  and  more  of  the  "business  rules"  that 
guide  organizations  become  embodied  in 
software,  we  are  less  able  to  understand  an 
organization  without  understanding  its  software. 
In  the  electronic  mail  system,  who  has  access  to 
the  President's  mail  box?    If  the  comptroller  did 
not  see  the  negative  cash  fiow  trend,  could  he 
have  with  a  single  command  on  his  PC,  or  was 
part  of  the  problem  with  the  firm  the  way  in 
which  case  flow  had  been  obscured  in  setting 
up  accounts? 

In  the  private  sphere,  we  do  not  have  the  clout 
to  insist  that  software  be  retained,  except  in 
some  heavily  regulated  arena's,  but  for  public 
agencies,  that  affect  the  lives  of  numerous 
individuals  and  groups  in  administration  of 


'Chandler,  Alfred  D.    Jr.;  Strateev  &  Structure: 
Chapters  in  the  History  of  the  Ainencan 
Industrial  Enterprise.  CambnSge.  MIT  Press, 
T95I — 


Spring   1989 


22  - 


iassist   quarterly 


every  regulation,  and  do  so  with  criteria  built 
into  software,  it  is  an  issue  of  accountability. 
When  the  immigrant,  a  Pakistani  child-bride  of 
a  permanent  resident,  was  denied  temporary 
resident's  status,  was  the  denial  (made  by 
computer  processing  of  facts  in  her  application 
and  his  record)  correcdy  interpreting  the  intent 
of  Congress? 

Indeed,  social  scientists  will  increasingly  find,  if 
they  have  not  already  discovered  it,  that 
software  available  to  them,  and  to  organizations 
in  our  contemporary  society,  enables  them,  or 
restricts  them.    How  we  interact  with 
organizations,  no  less  than  how  we  interact  with 
data,  is  becoming  a  product  of  software 
interfaces  and  functions. 


Collecting  Software 

It's  one  thing  to  realize  that  software  is  itself  a 
source  of  evidence  about  the  fimctioning  of 
society  and  a  crucial  key  to  understanding  the 
execution  of  organized  activity  in  our  age,  and 
quite  another  to  do  something  about 
documenting  iL    Our  first  problem  is  that  we 
are  very  uncertain  about  what  we  would  need 
to  keep  in  order  to  adequately  document 
software. 

Should  we  retain  a  nmning  version  of  system, 
i.e.  the  panoply  of  licensed  software  (object 
code)  that  operated  in  an  application  arena? 
This  is  impossible  as  anyone  who  has  been 
responsible  for  systems  software  in  a  large 
organization  is  painfully  aware.    Even 
documenting  what  version  of  an  application 
system  was  running  with  what  releases  of 
operating  system,  telecommunications  monitors, 
report  writers,  and  hardware  configurations  is 
exceptionally  difficult,  and,  in  practice,  such 
"configuration  management"  of  computing 
facilities  is  found  more  in  exhortation  than  in 


practice.    Actually  maintaining  running  software 
systems  as  they  were  over  their  active  life 
would,  of  course,  also  involve  maintaining 
hardware,  which  is  at  least  prohibitively 
expensive,  if  not  impossible. 

If  we  don't  retain  a  usable  software  system, 
what  are  the  best  sources  of  evidence  about 
how  it  ran?    Interestingly,  the  answer  is  not  as 
simple  as  keeping  the  source  code,  although 
much  can  be  learned  from  source  code 
(assuming  it  is  written  in  a  language  that  can  be 
understood  by  the  researcher).    It  is  extremely 
difTicult  to  reconstruct  what  it  feels  like  to  use 
a  system  from  its  source  code,  so  external 
functional  specifications  and  tutorials,  user 
documentation  and  even  films  of  the  system  in 
use,  are  important  documentation.    Finally, 
assuming  we  want  to  understand  not  just  what 
the  software  was,  but  how  it  came  to  be  that 
way,  we  will  want  to  retain  design  specifications 
and  early  drafts  of  important  routines  and 
algorithms.    In  other  words,  collecting  software 
as  documentation  is  really  no  different  than 
collecting  documentation  of  any  other  activity  of 
an  organization,  and  does  not,  ultimately, 
involve  actually  collecting  the  stuff  normally 
thought  of  as  software,  e.g.,  object  code. 


Conclusions 

Responsible  social  science  research  on  large  data 
collections  requires  that  we  understand  the 
software  in  which  such  aggregations  of 
information  came  to  be  collected. 
Understanding  the  way  in  which  people  interact 
with  their  society  in  the  late  twentieth  century 
also  requires  that  we  understand  the  nature  of 
the  systems,  run  by  programs,  that  they  are 
interacting  with.    Finally,  we  can  only 
understand  the  ways  in  which  organizations,  and 
even  some  individuals,  succeed  or  fail  in  our 
society  if  we  appreciate  the  fact  that  their 


Spring   1989 


iassist   quarterly  —   23 


successes  and  failures  are  mediated  by  computer 
programming  decisions,  embodied  in  code,  that 
reflect  their  expectations  of  the  way  in  which 
the  external  world  will,  or  should,  behave  and 
how  they  will  respond  to  it    For  all  these 
reasons,  we  must  prepare  ourselves  to  collect 
software  as  documentation. 

Do  we  know  what  to  do  with  this 
documentation?    At  present,  no.    We'll  need  to 
learn  how  to  read  these  new  sources  of 
evidence  however  if  we  are  going  to  make  sense 
of  the  political,  economic,  social  and  even 
cultural  worlds  in  which  we  live.n 


Spring   1989 


24  - 


iassist   quarterly 


Is  Data 

Redundancy  the 

Price  Archivists 

Will  Pay  for 

Adequate 

Documentation 


by  Margaret  Hedstrom' 

New  York  State  Archives  &  Records 

Administration 


Federal,  state,  and  local  government  agencies  in 
the  United  States  are  major  repositories  of 
important  soical,  scientific,  and  economic  data. 
By  automating  many  of  their  basic  record 
keeping  functions,  agencies  at  all  levels  of 
government  have  become  stores  of  vast 
quantities  of  data  on  citizens  and  on  public 
programs.    Unfortunately,  data  acquisition, 
preservation,  and  dissemination,  especially  by 
state  and  local  govenmient  archives,  have  not 
developed  apace.    Warnings  about  the  loss  of 
important  contemporary  and  historical  records  at 
the  federal  level  should  be  multiplied  many-fold 
when  considering  state  and  local  government 
records.    Few  states  have  addressed  the  issue  of 
preserving  records  in  machine-readable  form. 


while  not  a  single  local  government  archives 
program  preserves  electronic  records.^ 

This  article  examines  some  of  the  implications 
of  increased  data  sharing  among  local,  state,  and 
federal  agencies  for  the  acquisition,  preservation 
and  dissemination  of  data  by  archives.    For 
social  science  data  archivists  and  others  not 
familiar  with  government  archives,  it  is 
important  to  point  out  that  traditional  archives 
differ  in  some  respects  from  social  science  data 
archives  and  libraries.    Government  archives 
identify,  preserve,  and  make  available  public 
records  with  enduring  value  for  historical  or 
other  research.    Traditional  archives  are 
concerned  with  maintaining  records  of  how 
government  agencies  performed  their  mandates 
as  well  as  records  of  the  individuals, 
organizations,  and  other  phenomena  that  were 
influenced  by  that  mandate. 

Government  archives  acquire  and  preserve 
administrative  data  mosdy  in  the  form  of 
traditional  paper  files.    In  the  process  of 
regulating  a  myriad  of  activities  and  providing  a 
wide  rage  of  direct  services,  government 
agencies  collect  a  wealth  of  data  (an  increasing 
portion  of  it  in  machine-readable  form)  on 
almost  every  aspect  of  social  activity.    The  vast 
majority  of  the  data  collected  by  state  and  local 
government  agencies  is  compiled  to  administer 
programs  while  research  use  remains,  at  best,  a 
secondary  consideration.    The  federal 
government  is  similar  to  state  and  local 
governments  in  this  respect,  but  state  and  local 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technology  (IASSIST)  Conference  held  in 
Washington,  D.C.,  U.S.A.  on  May  26-29,  1988 


Tor  a  discussion  of  the  problems  of  preserving 
electronic  records  from  federal  government 
agencies,  see  the  committee  on  the  Records  of 
Government,  Report  (Washington,  D.C.,  Council 
on  Library  Resources,  1985).    In  recent  years 
several  states  have  begun  to  address  the 
problems  of  machine-readable  records,  most 
notably  New  York,  Kentucky,  Washington, 
Ohio,  Delaware,  and  Wisconsin.    However,  no 
state  has  a  fully  developed  program  for 
acquisition,  preservation,  and  dissemination  of 
data.    Archival  programs  for  local  govememnt 
records  are  even  less  developed. 


Spring   1989 


iassist   quarterly 


-   25 


governments  rarely  conduct  original  research  and 
they  do  not  maintain  independent  statistical 
agencies. 

Since  the  1960s,  federal,  state,  and  local 
government  agencies  have  developed  new 
information  systems  which  collect  data  from 
private  citizens,  business,  and  other  goverimient 
agencies  and  distribute  data  to  all  of  these 
constituencies.    No  level  of  government  exists  as 
an  isolated  island  today.    Increasingly, 
information  flows  between  levels  of  government 
using  transmission  methods  that  range  from  the 
primitive  shipment  of  paper  forms  to  real-time 
transfers  through  complex,  interactive  computer 
networks. 

Intricate  information  flows  pose  new  problems 
for  archivists  who  work  in  traditional 
government  archives  because  they  challenge  a 
parochial  vision  that  is  bounded  by  the  limits  of 
a  single  government's  jurisdiction.    Traditionally, 
the  National  Archives  has  preserved  federal 
records,  state  archives  have  preserved  state 
records,  and  local  archives  (where  they  exist) 
have  preserved  local  government  records.    This 
structure  is  inadequate  for  identifying  and 
preserving  valuable  information  from  shared 
functions,  because  the  systems  designed  to 
process  and  facilitate  data  exchanges  exceed  the 
boundaries  of  one  level  of  government,  while 
provenance  and  data  ownership  become 
imclear.' 

Different  levels  of  government  exchange  data 
for  specific  purposes.    Data  sharing  occurs  when 
federal,  state,  and  local  agencies  joindy  regulate 
activities,  administer  basic  functions,  or  share  in 
the  delivery  of  services.    Social  welfare  and 
transportation  programs  in  which  states  and 
localities  provide  services  that  meet  uniform 
standards  in  exchange  for  modest  amounts  of 
federal  funding  are  examples  of  this  type  of 


'Provenance  is  the  principle  of  grouping  public 
records  according  to  their  origins  in  the 
administrative  structure. 


data  exchange.    Different  levels  of  government 
also  exchange  data  when  there  is  a  willingness 
to  share  information  that  is  necessary  for 
administering  functions  which  exceed  a  single 
jurisdiction.    Data  available  at  the  local,  state, 
and  federal  level  to  track  and  apprehend 
criminals  is  perhaps  that  best  example  of  this 
type  of  data  sharing.    Public  agencies  may  also 
exahnge  data  when  the  use  of  publicly  available 
data  is  more  expedient  or  less  expensive  than 
independent  data  collection.    In  the  area  of 
educational  statistics  —  a  function  with  a 
limited  federal  funding  or  regulatory  role  —  it 
is  more  expedient  for  the  federal  government 
impose  reporting  requirements  on  states,  and  for 
states  to  impose  reporting  requirements  on 
localities,  than  it  is  to  conduct  independent 
research  surveys  on  educational  programs. 

The  increasing  transfer  of  data  between  levels 
of  government  reflects  basic  changes  in  the 
administration  of  govoemment  programs  and  the 
delivery  of  public  services.    In  the  last  two 
decades,  state  and  federal  agencies  have 
decentralized  responsibility  for  most  direct 
service  delivery.    At  the  same  time,  government 
agencies  have  responded  to  a  real  or  perceived 
demand  for  greater  accountability.    This  dual 
goal  of  decentralization  and  enhanced 
accountability  is  handled  operationally  by 
passing  large  volumes  of  information  between 
regulating  or  funding  agencies  and  the  agencies 
that  directly  perform  government  functions.    A 
more  recent,  but  parallel  trend  is  subcontracting 
with  the  private  sector  for  a  growing  portion  of 
the  direct  services.    The  technological  capability 
to  transfer  data  between  systems  is  another 
factor  in  the  growth  of  data  exchanges,  but  is 
not  the  primary  reason  for  the  increasingly 
complex  data  transfers  in  the  last  two  decades. 

Two  examples  illustrate  the  complexity  and  the 
volume  of  the  information  flows  between  levels 
of  govemmenL    The  Medicaid  Management 
Information  System  (MMIS)  —  a  state-level 
system  found  in  most  states  —  is  used  to 
determine  eligibility,  monitor  fees,  process 


Spring   1989 


26  - 


iassist   quarterly 


claims,  and  evaluate  the  program's  costs  and 
effectiveness.    In  Utah's  MMIS,  the  claims 
processing  portion  has  more  than  100 
machine-readable  master  files  and  it  produces 
316  different  output  reports.    The  system 
produces  six  truckloads  of  paper  and  nearly 
20,000  sheets  of  computer  output  microfiche 
each  month.    Many  of  the  output  reports  are 
transferred  to  the  federal  government  because 
they  are  mandated  by  reporting  requirements. 
But  information  is  also  exchanged  among  local 
social  service  agencies,  public  hospitals  and 
clinics,  insurance  companies,  and  private 
providers."  The  Medicaid  system  may  be  one  of 
the  largest  and  most  complex  examples  of  a 
jointly  administered  program,  but  it  exemplifies 
the  complicated  information  needs  of  many 
public  health  and  social  welfare  programs. 

A  second  example  is  the  national  criminal 
records  system,  initially  designed  and  funded  by 
the  Law  Enforcement  Assistance  Administration 
(LEAA)  in  the  late  1960s  and  1970s.    A 
complex  network  with  data  on  criminal  histories, 
criminal  identities,  warrants,  and  the  like  allows 
the  transfer  of  data  vertically  between  local, 
state,  and  federal  law  enforcement  officials,  and 
laterally  among  criminal  justice  agencies  within 
and  between  states.    In  addition  to  identification, 
socio-demographic  background,  and  criminal 
history  data  on  millions  of  ofTenders  and 
suspects,  the  system  contains  data  on  significant 
actions  taken  by  police  agencies,  district 
attorneys,  courts,  probation  departments. 
conectional  institutions,  and  parole  boards.'  A 


'Ken  White,  "We  Have  the  Program,  Now  We 
Need  Federal  Approval,"  unpublished  paper 
presented  at  the  annual  meeting  of  the  Society 
of  American  Archivists,  SepL  5,  1987.    For 
another  example  of  an  inter-govemmental 
information  system,  see  Robert  H.  Crowley  and 
James  J.  Heaphey,  The  Welfare  Management 
System  in  New  Yor¥^tate:  A  Case  Studv~o? 
Management  Information  Systerns  in 
GoyernmeiTU  (Albany,  NY:  Rockefeller  Institute 
of  Government  and  the  Governor's  Office  of 
Employee  Relations,  OcL  1984). 
'Alan  Kowlowitz,  "Hands  Up,  You're  Under 


major  impetus  for  the  system  was  the 
recognition  that  fragmented  information  residing 
with  the  Federal  Bureau  of  Investigation,  fifty 
state  police  organizations,  and  a  thousands  of 
local  police  agencies  was  ineffective  for  tracking 
and  apprehending  criminals  who  have  little 
respect  for  municipal  boundaries  of  state 
borders. 

Data  shanng  is  not  limited  to  vertical  transfers 
between  the  various  levels  of  governmenL 
Responsibility  for  specific  government  functions 
is  not  always  confined  to  a  single  agency. 
Criminals,  for  example,  are  handled  by  local 
police,  district  attorneys,  courts,  correctional 
institutions,  and  probation  departments.    They 
may  also  have  a  history  of  substance  abuse,  a 
family  on  public  assistance,  chronic  health  or 
mental  health  problems,  and  a  need  for 
vocational  education.    Similarly,  transportation, 
environmental  conservation,  parks,  land  use 
planning,  and  taxation  departments  all  perform 
functions  that  affect  a  single  geographic  area. 
Some  local  governments,  that  have  computerized 
recently  find  that  the  major  advantage  to 
automating  is  the  ability  to  combine  data  on 
clients  who  are  served  by  many  different 
programs  or  on  one  specific  geographical.' 

In  spite  of  many  automated  systems  for  data 
exchange,  the  potential  for  data  sharing  far 
exceeds  what  is  cunent  practice  today. 
Bureaucratic,  administrative,  and  technical 


'(cont'd)  Arrest:  Appraising  Criminal  history 
Data  in  the  Age  of  the  Electronic  Case  File," 
unpublished  paper  presented  at  the  annual 
meeting  of  the  Society  of  American  Archivists, 
SepL  b,  1987.    (Forthcoming  in  Archival 
Informatics  Technical  Reports.  1989). 
'Rob  Gurwitt,  "The  Computer  Revolution: 
Microchipping  Away  at  the  Limits  of 
government."    Governing  the  State  and 
Localities  1  (May  1988),  pp.  3F?5.    Interest  in 
recombining  disparate  data  sets  has  spawned  the 
development  of  geographic  information  systems 
at  the  local  and  state  level.    See  Government 

le 
mapping. 


II    UIC    ivjvxii    aiiu    aunt    n,yv.i.      jv-v    v-m»v-iiuir 

fechnology.  special  issue  on  computerized 
napping.  Vol.  1,  #5  (Sept-/OcL  1988). 


Spring   1989 


tassist   quarterly 


-   27 


obstacles  to  data  sharing  create  barriers  to  the 
free  flow  of  information.    The  existence  and 
availability  of  administrative  databases  is 
generally  not  well  known  even  to  those  working 
within  one  level  of  government.    Some  agencies 
are  possessive  of  their  data  and  prefer  not  to 
exchange  data  for  a  variety  of  reasons. 
Moreover,  administrative  data  systems  usually 
are  designed  for  a  very  specific  purpose  with 
idiosyncratic  data  structures,  imique  data 
definitions,  and  poor  documentation.'  Although 
an  administrative  data  set  might  contain  data 
related  to  a  secondary  application,  it  may  not  be 
specifically  useful  for  another  purpose.    Finally, 
data  exchanges  are  technically  difficult  because 
data  interchange  standards  have  not  been  widely 
adopted.    All  of  these  factors  inhibit  data 
sharing  and  lead  to  redundant  data  collection. 

The  records  and  data  that  document  fimctions 
shared  by  federal,  state,  and  local  governments 
create  several  problems  for  data  archives.    One 
problem  is  data  redundancy.    When  local 
authorities  report  to  the  state  authorities,  more 
often  than  not,  they  maintain  copies  of  the  data 
(or  the  hard  copy  records)  that  they  transmit 
When  state  agencies  report  to  a  federal  agency, 
they  are  likely  to  also  maintain  copies  of  the 
information  they  transmit    In  some  cases,  such 
duplication  is  actually  required  by  federal  and 
state  regulations.    Redundancy  is  even  more 
apparent  when  information  fiows  in  the  other 
direction.    Policy  directives  and  statistical  data 
from  a  federal  agency  may  be  duplicated  in  all 
fifty  states,  and  duplicated  again  in  thousands  of 
local  government  agencies.    Seen  from  this 
perspective,  the  greatest  problem  facing 
archivists  appears  to  be  the  overabundance  of 
machine-readable  data  —  little  of  which  is 
unique. 


Data  redundancy,  however,  is  not  the  most 
challenging  problem  of  complex, 
intergovernmental  data  flows.    It  is  a 
tremendous  waste  of  limited  resources  for 
archives  to  preserve  identical  data  sets  at  the 
local,  state,  and  federal  levels  when  much 
unique  and  valuable  data  is  losL    Yet  some 
duplication  of  data  may  be  the  price  that 
archivists  will  have  to  pay  if  we  want  to 
preserve  usable  data  and  comprehensive 
documentation  of  shared  functions.    A  more 
challenging  problem  for  archives  is  the  need  to 
develop  approaches  to  the  analysis,  appraisal 
and  selection  of  data  that  transcend  the 
boundaries  of  a  single  level  of  govemmenL 
Government  archivists  who  analyse  large 
administrative  data  systems  recognize  that 
appraisal  must  begin  with  a  comprehensive 
overview  of  the  system,  its  basic  functions,  the 
general  types  of  data  it  handles,  and  the  types 
of  output  it  provides  —  through  both  hard  copy 
reports  and  potential  on-line  queries.'  Such  an 
overview  allows  the  archivist  to  recognize  the 
basic  logic  of  the  system  and  to  identify  key 
areas  for  more  detailed  appraisal.    An  archivist 
who  approached  the  Medicaid  Management 
Information  System  by  systematically  analyzing 
each  of  the  316  output  reports,  would  be 
hopelessly  lost  before  gaining  even  a  slight 
semblance  of  why  these  reports  were  created  or 
how  they  were  used. 

To  develop  an  overview  of  a  system  like  MMIS, 
which  is  designed  in  part  to  transmit 
information  between  levels  of  government, 
archivists  must  gain  a  perspective  that  accounts 
for  the  flow  of  information.    Unfortunately,  no 
mechanisms  exist  yet  for  approaching  appraisal 
in  this  way.    Although  archivists  may  consult 
other  levels  of  government  to  determine 
whether  the  records  they  are  appraising  are 
being  preserved  elsewhere,  this  information  is 


'New  York  State  Criminal  Justice  Information 
Systems  Improvement  Program.  Measurement 
Issue  in  Pnson  and  Jail  Overcrowding  (Albany. 
Division  of  Criminal  Justice  Services,  May 
1988). 


'Thomas  Elton  Brown,  "Appraisal  in  the 
Information  Age,"  paper  presented  at  the  annual 
meeting  of  the  Assoaation  of  Canadian 
Archivists,  June  1987. 


Spring   1989 


28  - 


iassist   quarterly 


not  readily  available  even  for  small  sets  of 
traditional  records.    More  exchange  of  appraisal 
information  among  archives  is  an  essential  first 
step,  but  appraisal  of  complex  information 
networks  will  require  far  more  than  simply 
sharing  information  about  records  preserved  at 
various  levels  of  government.'  Joint  analysis  and 
appraisal  projects  need  to  be  carried  out 
simultaneously  by  archivists  working  at  the 
federal,  state,  and  local  levels.    The  objective  of 
such  projects  is  not  necessarily  to  avoid 
duplication  of  data  between  levels  of 
govemmenL    Rather,  it  is  to  select  data  that 
adequately  documents  how  each  level  of 
government  performed  its  responsibility  for  a 
shared  function.    If  local  and  state 
administrators  used  the  same  data,  but  used  it 
in  different  ways  with  quite  difTerent 
implications  for  social  service  recipients,  for 
example,  adequate  documentation  might 
necessitate  some  duplication. 

Another  concern  with  shared  functions  is  that 
no  single  level  of  government  maintains  a 
comprehensive  collection  of  data  on  a  particular 
program  or  its  recipients.    Some  database 
management  systems  provide  centralized  sources 
of  data,  but  the  networks  that  support  large, 
shared  government  functions  do  not  centralize 
all  of  the  information  in  one  place.    Rather, 
selected  pieces  are  passed  between  different 
levels  of  government  with  no  single  point  of 
data  compilation.    For  functions  that  are  carried 
out  primarily  by  local  authorities,  the  richest 
and  most  detailed  case-level  information  is 
likely  to  remain  at  the  local  level.    While 
detailed  case  information  may  be  of  greatest 
interest  to  sociologists,  economists,  social 
historians  and  other  researchers,  local 
governments  have  demonstrated  little  capability 


'A  project,  funded  by  the  U.S.    National 
Historical  Publications  and  Records  Commission, 
will  explore  the  use  of  the  Research  Libraries 
Information  Network  (RLIN)  for  the  exchange 
of  appraisal  data  among  15  state  and  local 
governmental  archives,  plus  the  U.S.    National 
Archives. 


to  preserve  the  information  in  machine-readable 
form.  Moreover,  the  records  maintained  by  any 
single  locality  cannot  provide  a  comprehensive 
picture  of  statewide  or  national  programs.  Data 
maintained  by  federal  agencies  may  provide  the 
necessary  breadth,  but  lack  the  detail  that  is 
essential  for  social  science  and  policy  research. 

Responsbility  for  preserving  and  disseminating 
data  from  large  networks  that  transcend  a  single 
governmental  jurisdiction  is  also  unclear.    Is 
data,  collected  by  local  governments  to 
administer  local  social  assistance  programs  and 
reported  to  a  state  or  federal  agency  to  meet 
reporting  requirements,  the  responsibility  of  the 
local  government  that  collected  it  originally  or 
of  state  and  federal  agencies?    This  may  seem 
like  a  relatively  simple  bureaucratic  question, 
but  as  long  as  the  issue  of  data  ownership 
remains  unresolved,  archives  may  lack  a  clear 
mandate  for  collecting  it  or  may  be  unwilling  to 
assume  responsibility  for  its  preservation.    With 
large  information  systems  that  are  used  to 
administer  major  social,  educational,  or 
regulatory  programs,  archives  at  each  level 
should  preserve  some  pieces  of  the  system,  but 
without  cooperative  approaches  to  appraisal  and 
preservation,  archivists  will  never  know  which 
pieces  to  preserve. 

Data  interchanges  also  raise  problems  of  data 
integrity  and  data  quality.    Very  large  databases 
that  collect  data  from  hundreds  of  sources  often 
have  very  high  error  rates.    In  spite  of 
well-intentioned  and  elaborate  efTorts  to 
mandate  standards  for  data  quality,  there  are 
few  effective  mechanisms  to  monitor  data 
providers  or  to  maintain  quality  standards. 
Moreover,  local  officials  are  unlikely  to  invest 
much  effort  in  providing  high  quality  data  as 
long  as  they  view  their  data  contributions  as 
little  more  than  meeting  mandated  reporting 
requirements  for  which  they  receive  little  usefiil 
information  in  return.    This  is  especially  true 
when  their  primary  responsibility  is  to  provide 
direct  services  to  clients  under  increasing  fiscal 


Spring   1989 


iassist   quarterly 


-   29 


constraints."  The  ability  to  download  parts  of  a 
database,  combine  it  with  other  sources  of  data, 
and  manipulate  the  information  for  new 
purposes  also  threatens  data  integrity.    Even  if 
archives  develop  the  capability  of  preserving 
selected  pieces  of  these  databases,  archivists  may 
not  be  able  to  document  the  data  accurately  and 
precisely  enough  for  secondary  use. 

The  enhanced  capability  to  exchange  data  in  the 
past  few  years  has  also  increased  concern  over 
privacy  and  access  to  information.    Although 
privacy  and  access  considerations  have  been  an 
ever-present  theme  in  data  archives,  the 
enactment  of  privacy  protection  provisions  and 
the  establishment  of  guidelines  and  procedures 
for  handling  confidential  data  quelled  some  of 
the  concern  over  unwarranted  invasions  of 
privacy  during  the  1970s  and  early  1980s.    This 
issue  has  surfaced  again  for  several  reasons. 
With  the  recent  increase  in  automation  at  the 
local  government  level,  local  governments  are 
beginning  to  amass  large  quantities  of  personal 
data  in  machine-readable  form.    Local 
governments  may  or  may  not  have  guidelines  in 
place  to  administer  such  data,  but  they  generally 
lack  the  experience  of  federal  and  state  agencies 
with  this  problem.    There  is  also  a  growing 
interest  in  the  commercial  sector  in  acquiring, 
linking,  repackaging,  and  selling  information 
from  public  records  at  all  levels  of 
govemmenL''  Finally,  some  types  of  linkage  and 
data  interchange  that  were  technically 
challenging  or  too  expensive  to  consider  in  the 
1960s,  are  quite  feasible  today. 


'"For  discussion  of  the  conflicting  interests  of 
state  and  local  administrators,  see  James  J. 
Heaphey  and  Robert  H.  Crowley,  Standardizing 
Welfare  Management:  The  State  Versus  the" 
Counties  (Albany.  NY:~Rockefeller  Institute  of 
Government  and  The  Governor's  Office  of 
Employee  Relations,  Oct.  1984). 
"Massachusetts  OfTice  of  the  Secretary  of  State, 
Public  Records  Division.    Report  of  the  First 
National  Conference  of  Issues  Concerning 
Computerized  Public  "Records.  Boston,  Mass., 
1987. 


The  response  to  new  data  linkage  capabilities 
may  be  new  efforts  to  restrict  access  to  public 
records.    Records  that  are  innocuous  and  pose 
no  threat  to  privacy,  may  become  restricted 
simply  because  they  have  the  potential  to 
threaten  personal  privacy  when  linked  with 
other  records.    Access  restrictions  are  also 
problematic  for  systems  that  exchange  data 
between  different  levels  of  govemmenL    Because 
federal,  state,  and  local  freedom  of  information 
and  privacy  laws  are  not  identical,  archivists  or 
public  records  custodians  who  administer  the 
data  must  determine  which  restrictions  apply. 
This  is  challenging  problem  when  issues  of 
ownership  remain  unresolved. 

The  growing  use  of  proprietary  software  for 
large  integrated  networks  may  also  threaten 
access  and  compromise  the  ability  of  archives  to 
preserve  data.    Many  large  integrated  networks 
use  software  that  is  protected  by  copyrights  or 
licensing  agreements.    Because  some  data  cannot 
be  used  separately  from  software,  special 
agreements  may  be  necessary  to  allow  for  its 
preservation  and  access  in  an  archives.    Finally, 
the  transfer  of  public  sector  functions  to  private 
facilities  may  limit  access  to  data  unless  policies 
are  developed  to  clearly  define  such  records  as 
part  of  the  public  record. 

Just  as  the  exchange  of  information  among 
local,  state  and  federal  agencies  poses  new 
challenges  for  data  archivists  in  the  public 
sector,  this  development  may  also  foster  positive 
changes  in  the  field.    The  need  to  exchange 
data  and  documents  for  administrative  purposes 
may  hasten  the  development  and  adoption  of 
data  interchange  standards.    Federally  mandated 
reporting  requirements  already  impose  some 
degree  of  standardization  on  the  data  collected 
to  document  a  wide  variety  of  program 
activities,  and  these  federally  mandated 
standards  make  it  possible  to  identify  fairly 
consistent  data  sets  in  localities  across  the 
country.    The  widespread  adoption  of  data 
interchange  standards  is  also  essential  if  data 
archives  are  to  preserve  software-dependent  data 


Spring   1989 


2Q  _  iassist   quarterly 


without  the  need  to  also  preserve  hundreds  of 
different,  non-standardized  software  systems. 
But  if  the  demand  for  standards  comes  solely 
from  data  archives,  it  is  unlikely  that  software 
and  hardware  vendors  will  respond.    The  need 
among  administrative  agencies  to  exchange  data 
creates  a  myriad  of  new  problems  in  the 
identification  and  selection  of  data.    But  this 
need  may  also  result  in  simpler  and  more 
uniform  ways  of  exchanging  information  which 
may  ultimately  make  it  easier  for  data  archives 
to  preserve  complex  data  sets.n 


Spring   1989 


iassist   quarterly 


-   31 


History  and  the 
Data  Archives 


by  Hans  J0rgen  Marker' 
Danish  Data  Archives 


The  Usage  of  Computers  in  History  in  Europe 

The  Mainframe  Age 

In  Europe  the  usage  of  computers  in  historical 
research  has  risen  sharply  in  the  past  few  years. 
This  cleariy  has  to  do  with  the  increasing 
availability  of  computing  power  in  the  form  of 
micro  computers.    Although  a  hard  core  of 
computer  buffs  in  the  field  of  history  completed 
large  projects  back  in  the  60's  and  70's  it  was 
both  in  numbers  and  methodology  a  small 
minority  in  the  historical  sciences. 


'Presented  at  the  International  Association  for 
Social  Science  Information  Service  and 
Technology  (IASSIST)  Conference  held  in 
Washington,  D.C..  May  26-29,  1988 


The  very  first  project  applying  computers  in 
history  was  the  "Index  Thomesticus"  by  Padre 
Busa,  dealing  with  the  writings  of  Thomas 
Aquinas^  This  project  was  in  the  field  of 
indexing  source  material.    Shortly  afterwards 
another  major  trend  emerged,  the  coding  of  the 
contents  of  sources.    Sources  were  standardized 
in  a  way  that  facilitated  computerized  analysis 
especially  by  means  of  statistical  software  or 
rather  crude  business  type  database  software. 

Among  the  major  database  projects  of  the 
mainframe  computer  age  are  the  large  Swedish 
demographical  databases  of  Umea/Haparanda 
and  Stockholm.    These  projects  are  too  large  to 
be  considered  typical,  but  are  on  the  other  hand 
to  some  extent  only  larger  scale  applications  of 
principles  which  governed  most  of  the  historical 
computing  of  the  time. 

The  projects  of  the  earlier  days  usually  relied 
on  one  or  several  systems  programmers  being 
employed  on  the  project    The  software 
developed  for  the  projects  was  dedicated  to  the 
project  in  question  and  though  a  number  of 
roughly  similar  database  management  systems 
were  developed  they  were  almost  never  used  by 
other  projects. 

In  many  cases  the  developing  project  was  most 
willing  to  share  its  software  with  others,  but  the 
performers  of  new  projects  felt  more  inclined  to 
develop  their  own  systems  rather  than  to  adopt 
something  existing. 

Among  the  exceptions  from  this  rule  is  the 
CLIO-system  developed  for  the  Sperry 
UNIVAC  at  the  Max  Planck  Institut  flir 
Geschichte  in  GOttingen.    This  system  has  been 
used  with  success  in  connection  with  20  to  40 


'Manfred  Thaller:  Data  Bases  versus  Critical 
Editions,  in  Data  Base  Oriented  Source  Editions, 
Papers  from  two  Sessions  at  the  23rd 
International  Congress  on  Medieval  Studies, 
Kalamazoo,  Michigan,  5-8  May,  1988,  p.2. 


Spring   1989 


32  - 


iassist   quarterly 


research  projects.' 

The  PC  Age 

With  the  proHferation  of  PC's  since  the 
beginning  of  the  1980's,  the  general  pattern  of 
the  appHcation  of  computers  among  historians 
has  changed  fundamentally.    The  audience  for 
historical  computing  has  increased  dramatically, 
not  only  has  computing  power  become  more 
abundant,  friendly  and  cheap,  it  also  seems  that 
many  historians  feel  more  comfortable  with 
having  the  computer  on  their  desk  than  in  the 
building  next  door. 

Naturally,  the  first  thing  which  aroused  the 
enthusiasm  of  the  historians,  when  they  got 
their  hands  on  PC's,  were  the  possibilities  in 
the  field  of  wordprocessing.    This  enthusiasm 
has  not  yet  died  out    Historians  and  other 
people  of  the  arts  departments  are  zealously 
discussing  the  advantages  of  one  package  over 
the  other.    Even  at  scientific  conferences,  you 
may  still  come  across  papers  praising  a 
particular  wordprocessor  as  the  ultimate  research 
tool.    Another  wing  of  this  discussion  is  the 
debate  on  how  to  apply  the  text  formatting 
packages  of  the  mainframe  computer  days  in 
the  micro  computer  scene,  and  on  the 
methodological  advantages  of  text  formatting  as 
opposed  to  mere  wordprocessing. 

Later  on  the  use  of  commercial  database 
management  systems  has  caught  up.    The  most 
widespread  use  of  these  systems  is  the  making 
of  computerized  files  to  replace  the  cardbox 
files  which  many  historians  have  kept  from  time 
immemorial. 


of  large-scale  projects.    The  historians' 
programming  effort  follows  several  difTerent 
paths,  such  as  the  presentation  of  historical  data 
for  educational  purposes  or  the  making  of  tools 
for  solving  specific  research  questions.    A  major 
area  of  historical  program  development  is  record 
linkage,  a  question  which  has  always  attracted  a 
lot  of  attention,  also  in  the  days  of  mainframe 
projects. 

The  very  large  diversity  between  the  uses  of 
computers  in  history  was  seen  at  the  two 
conferences  of  the  Association  for  History  and 
Computing  1986  and  1987.    The  proceedings  of 
the  1987  conference  has  been  published*  and 
the  proceedings  of  the  1987  conference  will  be 
available  soon. 


Historical  Informatics 

A  specialized  theory  for  the  application  of 
computerized  methods  in  history  is  beginning  to 
emerged  This  particular  branch  of  science  may 
appropriately  be  called  historical  informatics. 

The  general  concept  of  this  trend  in  science  is 
that  there  are  questions  in  history  which  are 
specific  to  historical  science  and  which  demand 
special  software  solutions,  which  are  not  likely 
to  be  provided  by  commercial  software 
developers  or  by  any  other  nonhistorians  for 
that  matter. 


Also  some  historians  have  become  programmers 
as  historical  computing  is  now  being  done  on  an 
individual  basis  more  than  within  the  framework 


'Manfred  Thaller:  KLEIO,  Ein  fachspezifisches 
Datenbanksystem  flir  die  Historischen 
Wissenschaften,  GOttingen  1987.    KLEIO  is  the 
PC  version  of  CLIO. 


Teter  Denley  and  Deian  Hopkin  (eds.):  History 
and  Computing,  Manchester  1987. 
^A  justification  of  this  specialized  theory  can  be 
found  in  Manfred  Thaller:  Why  do  we  need  a 
Theory  of  Historical  Computing.    In 
"Proceedings  from  the  second  annual 
Conference  of  the  Association  for  History  and 
Computing,"  edited  by  Charles  Harvey, 
Manchester  1988. 


Spring   1989 


iassist   quarterly 


-    33 


The  questions  that  a  historian  would  Hke  to  put 
to  a  given  data  collection  are  usually  not  of  the 
kind  that  is  supported  by  searches  for  a  specific 
word  in  a  given  field  or  for  some  numerical 
value  in  another  field. 

Many  terms  and  objects  are  time  dependent  and 
dependent  on  context.    By  making  time  a  part 
of  the  context  they  could  be  referred  to  with 
the  general  term  of  "context  sensitivity".    An 
obvious  example  of  time  dependency  is  the 
exchange  rates  of  the  different  coins  of  the 
Kipper  Wipper  time*.  In  Denmark  the  number 
of  'skilling's  to  a  'daler'  increased  from  64  to  96 
over  a  period  of  23  years.    At  the  same  time, 
the  monetary  units  of  that  time  in  Denmark  are 
an  example  of  dependency  of  context  as  such, 
as  the  term  a  'daler'  in  an  official  document  of, 
say,  1611  would  mean  a  'rigsdaler'  of  74 
'skilling's,  while  in  a  private  estate  ledger  or 
letter  it  would  mean  a  'sletdaler'  of  64 
'skilling's.    Another  obvious  example  of  time 
dependency  would  be  the  boundaries  of  a 
specific  territory,  say  Prussia.\  A  third  example 
of  context  sensitivity  is  the  term  a  hundred, 
which  in  north  western  Europe  might  mean  100, 
120  or  144  depended  on  context    This  holds 
true  even  when  a  hundred  is  written  in  ciphers 
(100). 

Another  important  aspect  of  historical  data  is 
'fuzziness'.    Data  are  fuzzy,  when  seemingly 
accurate  data  are  in  fact  the  expressions  of  an 
underlying  distribution.    An  example  could  be 
that  a  document  gives  the  age  of  the  persons 
mentioned  in  multiple  of  five  years.    In  a 
document  like  that  the  age  15  would  probably 


'The  first  quarter  of  the  seventeenth  centtiry. 
Information  on  the  effect  of  the  Kipper  Wipper 
time  in  Denmark  can  be  found  in:  H.J.  Marker: 
Sletdalerbegrebet  i  ffirste  flerdedel  af  17. 
arhundrede,  Historie  XV,  4,  1985,  p.  633  et  seq. 
'The  example  is  given  in  Manfred  Thaller:  Why 
do  we  need  a  Theory  of  Historical  Computing?, 
in  Charles  Harvey  (ed.):  Proceedings  of  the 
second  anniial  conference  of  the  Association  for 
History  and  Computing,  Manchester  1988. 


mean  something  like  an  age  between  13  and  17, 
but  to  determine  or  approximate  the  actual 
distribution  of  the  ages  tmderiying  the  number 
15  would  be  a  research  project  in  itself. 

The  software  needed  to  handle  context 
sensitivity  and  fuzziness  is  clearly  something 
quite  difTerent  from  commercial  database 
management  systems. 


The  historical  workstation 

One  theoretical  attempt  at  approaching  the 
uniqueness  of  historical  computing  is  the 
historical  workstation.    The  technical  definition 
of  this  term  is: 

A  desktop  computer,  which 

has  access  to  data  base  management  software, 
which  is  able  to  administrate  very  difTerent 
structures  of  information,  allowing  to  put  into 
a  data  base  arbitrarily  large  collections  of 
sources,  keeping  as  much  information  of  the 
original  and  applying  as  little  coding,  as  is 
economically  feasible  for  the  project 
producing  the  data  base, 
has  access  to  a  set  of  data  bases,  which 
contain  background  knowledge  specific  to 
historical  research, 

has  access  to  a  large  number  of  read  only 
data  bases,  being  equivalent  to  traditional 
printed  editions  of  source  material, 
contains  sufficient  Artificial  Intelligence 
subsystems  to  make  the  interaction  between 
the  forementioned  capabilities  transparent  to 
the  user, 

has  a  very  highly  integrated  interface 
between  the  data  base  management  system 
mentioned  and  a  desk  top  publishing  system 
and 


Spring   1989 


34  - 


iassist   quarterly 


a  similar  interface  to  statistical  software.'. 

This  definition  shows  that  the  historical 
workstation  is  a  software  concept  more  than  a 
hardware  concept    Although  a  piece  of 
hardware  is  necessary  and  will  have  to  meet  a 
number  of  minimum  requirements,  the  software 
for  the  workstation  is  likely  to  be  made 
available  on  several  different  brands  of 
hardware.    Most  of  all,  the  historical  workstation 
project  can  be  seen  as  a  framework  for  the 
co-ordination  of  international  co-operation  on 
software  development  in  the  field  of  history. 

The  project  concerning  the  realization  of  a 
historical  workstation  was  generated  at  the  Max 
Planck  Institut  fUr  Geschichte  in  GOttingen, 
Germany  and  it  is  progressing  with  the 
co-operation  of  a  great  number  of  institutions 
all  over  Europe.    In  Gbttingen  they  have  also 
managed  to  get  funding  for  some  of  the  major 
building  bricks  of  the  project  from  Volkswagen 
Stiftung  and  IBM,  Germany. 

The  historical  workstation  project  serves  as  a 
common  standard  for  program  development. 
The  condition  being  that  the  bits  and  pieces 
should  fit  into  the  entity  in  the  end.    Another 
important  prerequisite  of  this  project  is  the 
exchange  of  data,  preferably  over  national 
boundaries. 

Beyond  the  Historical  Workstation 

In  my  view  the  perspectives  arising  from  the 
historical  workstation  go  far  beyond  tool  making 
for  historical  research.    I  think  that  there  is  a 
reasonable  chance  that  the  tool  will  change  the 
trade.    This  would  happen  when  results  of 
research  projects  would  be  expressed  in  software 
running  on  a  commonly  available  workstation. 
A  procedure  like  that  would  be  especially 
relevant  when  the  project  in  question  were 


dealing  with  structures,  relationships,  values, 
models  or  other  things  which  are  easily 
expressed  on  a  computer.    In  this  way  the  next 
historian  would  be  starting  where  the  former 
ended  instead  of  going  through  the  same 
research  process  again.    Or  in  other  word  by 
using  software  to  express  research  results, 
history  would  acquire  the  equivalent  of  the 
symbolic  language  used  by  mathematics,  physics 
and  the  other  natural  sciences. 

A  trivial  example  would  be:  When  somebody 
expressed  the  dependency  of  exchange  rates 
between  coins  of  a  specific  territory  on  time 
and  context  in  a  piece  of  software  for  the 
workstation,  future  research  would  have  two 
options,  either  use  the  software  as  a  given 
building  brick  in  further  economical  history 
studies,  or  refine,  correct  or  expand  the  software 
by  means  of  intensified  studies  in  the  history  of 
currencies. 

The  example  is  deliberately  very  trivial  but 
similar  examples  could  be  made  where  the 
original  research  restilt  was  in  prosopography  or 
even  the  structure  of  society. 

This  extension  of  the  historical  workstation 
concept  is  my  own  and  should  not  be  held 
against  any  of  the  other  participants  of  the 
projecL    I  also  confess  that  the  whole  concept 
of  the  equivalent  of  a  symbolic  language  for 
history  is  heresy'  as  opposed  to  the  common 
scientific  standard  of  history  today,  where  a 
historian  is  expected  to  understand  every  detail 
of  the  picture,  which  he  is  presenting'". 

I  would  also  like  to  clarify  that  I  do  not  think 
that  the  way  of  making  historical  research 
which  I  have  outlined  above  should  be 
considered  the  only  "right"  way  of  doing 


'This  version  of  the  definition  is  quoted  from 
Manfred  Thaller:  Data  Bases  v.    Critical 
Editions,  p.5 


'The  word  'heresy'  was  given  to  me  by  Deian 

Hopkin. 

'"I  cannot  resist  the  temptation  to  add:  "rather 

than  to  present  a  very  large  or  interesting 

picture."    (Sorry  about  that) 


Spring   1989 


iassist   quarterly 


-  35 


research  in  history  in  the  future.    For  a  number 
of  research  questions  it  would  not  be  possible, 
for  a  great  number  of  others  it  would  not  be 
the  most  practical  way  of  working.    What  I  do 
think  however  is  that  this  way  of  doing 
historical  research  would  give  us  the  possibility 
of  getting  answers  to  some  questions,  which  we 
can't  answer  today,  and  even  some  which  we 
don't  even  get  far  enough  to  ask  by  present  day 
methods. 


The  Role  of  the  Data  Archives. 

The  data  archives  of  north  western  Europe  are 
national  social  science  data  archives.    They  have 
been  in  existence  for  a  couple  of  decades  now, 
and  are  generally  preoccupied  with  the 
conservation  of  social  science  survey  type  data. 
There  are  some  differences  between  their 
operation,  but  the  similarities  are  perhaps 
greater. 

The  traditional  role  of  the  data  archives,  to 
document  and  preserve  survey  type  data  are 
naturally  of  great  potential  interest  for 
historians.    The  survey  data  are  unique  historic 
sources,  as  they  are  dealing  with  the  attitudes  of 
the  general  populace  and  are  using  scientifically 
well  defined  methods  to  analyze  them.    This 
role  of  the  data  archives  will  be  of  increasing 
importance  as  the  data  materials  archived  in 
them  become  older. 

Data  materials  from  quantitative  history  usually 
fits  in  more  or  less  easily  in  the  general  pattern 
of  data  archiving.    As  they  are  numeric  they 
can  usually  be  forced  into  the  traditional  data 
description.    Some  historians  cleariy  feel  that 
the  study  description  schema  is  missing  the 


point  about  what  their  study  is  actually  about" 
and  finds  the  methods  of  traditional  data 
processing  at  the  data  archives  a  bit  strange. 
Anyhow  some  of  them  are  quite  willing  to  live 
with  that,  and  a  few  even  finds  the  data 
archives  useful. 

When  it  comes  to  data  materials  with  complex 
data  structures,  the  traditional  data  processing 
methodology  at  the  data  archives  runs  into 
difficulties  as  it  is  designed  to  cope  with  survey 
type  fiat  file  data.    I  don't  think  that  the  data 
archives  have  found  a  common  way  of  handling 
of  material  with  complex  data  stnicttures  yeL 
One  possibility  is  to  store  the  material  and 
redistribute  it  on  an  as  is  basis,  which  is  quite 
feasible  but  from  the  historians'  point  of  view  it 
must  be  hard  to  understand,  that  the  more 
complex  and  often  more  valuable  materials 
recieve  a  much  less  sophisticated  treatment  than 
the  simple  and  perhaps  not  all  that  interesting 
materials. 

Historians  also  generate  a  number  of 
nonnumeric  data  materials.    There  is  some 
uncertainty  about  whether  they  are  within  the 
scope  of  the  data  archives,  but  in  many  cases 
the  fruitful  historical  research  will  be  carried 
out  with  a  combination  of  numerical  and 
nonnumerical  material.    For  historians  working 
in  that  way  it  would  clearly  be  most  practical  to 
have  all  their  material  archived  at  the  same 
place.    Further  the  dividing  line  between  text 
and  other  complex  data  structures  is  becoming 
increasingly  blurred  with  the  advent  of  retrieval 
systems,  where  an  entire  text  is  stored  and  parts 
of  it  is  marked  up  as  fields  for  retrieval. 


"Some  discussions  have  been  going  on  at  three 
aimual  conferences  on  the  exchange  of  data 
within  the  historical  sciences.    The  outcome  of 
these  discussions  is  a  proposal  for  items  that 
ought  to  be  included  m  a  description  of  a 
historical  data  material.    One  version  of  this 
proposal  can  be  found  in  Herbert  Reinke,  Kevin 
Schurer  &  H.J.  Marker:  Information 
Requirements  and  Data  Description  in  Historical 
Soaal  Research.    A  Proposal,  Historical  Social 
Research  42/43,  KOln  1987 


Spring   1989 


36  - 


iassisl   quarterly 


In  the  ideal  world  all  the  problems  of  the 
historians  could  be  solved  by  the  national  data 
archive.    In  some  of  the  fields  the  expertise  is 
already  at  hand,  in  some  of  the  others  it  would 
have  to  be  acquired,  but  still  no  other 
institutions  have  a  better  foundation  for  serving 
the  needs  of  the  history  and  computing  people. 
The  national  social  science  data  archives  are 
centres  for  the  exchange  of  data  between 
researchers  and  they  have  for  years  now  been 
operating  an  international  data  exchange 
network  exactly  like  the  one  discussed  by  the 
historians  today. 

Unfortunately  we  are  not  living  in  the  ideal 
world,  and  in  the  world  we  live  in  one  of  the 
problems  is  funding.    In  some  European 
countries  it  is  regarded  as  unlikely  that  the 
increased  load  of  work  which  will  result  from 
increased  interest  of  the  data  archives  in  the 
field  of  history  will  be  compensated  by  an 
increase  of  staff  and  funding. 

In  Germany  a  solution  is  on  the  way  as  the 
Zentrum  f\ir  historisches  Sozialforschung  has 
been  reerected  as  a  department  of  the  Zentral 
Archiv  in  Koln.    The  traditional  interest  of  the 
Zentnmi  is  in  the  field  of  quantitatively  based 
social  and  political  history,  and  as  such  they  fit 
in  very  nicely  in  the  standard  procedures  of  a 
social  science  data  archive.    By  the  way  this 
also  happens  to  be  one  of  the  fields  of  history 
and  computing  which  have  the  strongest 
traditions. 

In  the  Netherlands  the  possibilities  for  making  a 
historical  data  archive  with  close  connections  to 
the  Steinmetz  archive  are  being  investigated,  and 
as  I  understand  it  an  experimental  historical 
data  archive  is  being  made  as  part  of  these 
enquiries. 

In  Norway  they  have  NAVF's  EDB  Senter  for 
Humanistisk  Forskning  '^  this  institution  is  not 


'^Norwegian  general  research  council's  EDP 
Centre  for  research  within  the  humanities 


devoted  to  history  alone  but  to  the  humanities 
as  a  whole.    They  even  feel  that  their  major 
responsibilites  lie  in  the  other  fields  of  the 
humanities  because  as  they  put  it:  "the 
historians  are  already  fairly  advanced."    Anyway 
a  number  of  their  projects  are  in  the  field  of 
history.    The  EDB  Senter  is  placed  the  same 
block  as  the  Norwegian  Social  Science  Data 
Archive  (NSD)  in  Bergen. 

Beside  the  mentioned  a  ntimber  of  other 
European  institutions  like  the  large  Swedish 
demographic  databases,  the  Norwegian  central 
for  the  registration  of  historical  demographic 
data  in  Tromso,  the  Cambridge  Group  for  the 
History  of  Population  and  Social  Change, 
Cambridge,  England,  and  the  Max  Planck 
Institut  filr  Geschichte  in  Gottingen,  Germany, 
serve  as  centres  for  and  to  different  extents  also 
as  data  archives  for  computing  in  history. 


The  DDA  and  History 

The  involvement  of  the  DDA  in  the  field  of 
history  goes  back  far.    One  of  the  reasons  for 
the  establishment  of  the  DDA  was  that  the 
social  science  surveys  were  of  considerable 
potential  historical  interest    Some  of  the  earlier 
DDA  studies  have  perhaps  reached  an  age 
where  their  prime  interest  may  lie  in  the  field 
of  history.    DDA  studies  are  numbered  in 
chronological  order  with  DDA-0001:  Danish 
Omnibus  Survey  1982,  being  the  first  study 
registered  at  the  DDA.    This  happened  in  1982. 
The  oldest  study  at  the  DDA  of  merely 
historical  content  is  DDA-0018:  Danish 
Politicians:  Members  of  Parliament  1849-1968. 
The  oldest  purely  historical  study  at  the  DDA  is 
DDA-0034:  Reconstitution  of  Biological 
Families.    Sejei0  1663-1813.    This  study  was 
registered  in  the  DDA  in  1976.    Of  considerable 
importance  for  the  DDA's  interest  in  history 
was  the  contact  with  Hans  Christian  Johansen. 


Spring   1989 


iassist   quarterly 


-    37 


This  contact  dates  back  to  the  very  start  of  the 
archive  in  1973". 

A  number  of  historical  research  projects  have 
been  carried  out  at  the  DDA  or  with  assistance 
from  the  DDA.    Among  the  DDA  generated 
projects  are  Prices  and  Wages  in  Eastern  Jutland 
1571-1661'\  which  is  a  data  collection  project 
on  prices  found  in  official  ledgers  given  from 
local  to  central  administration,  and  Bilantz  1660, 
which  is  a  registration  of  the  creditors  of  the 
crown  in  the  year  1660'^ 

As  regards  projects  in  which  the  DDA  has 
participated  the  following  three  amongst  others 
deserve  mentioning:  Population  History  of 
Greenland,  1800-1930,  {DDA-0235),  Coun 
Records:  Elsinore  1612-1730  and  Falster 
1665-1718,  (DDA-506)  and  Koldinghus  Fief 
Ledgers,  (DDA-755  and  830).    DDA  Studies  in 
the  Field  of  History 

The  1986  DDA  catalogue  lists  70  studies  in  the 
category  history  and  demography.    The  holdings 
have  increased  somewhat  since  then,  and  as 
mentioned  above  the  older  social  science  studies 
are  aging. 


Besides  being  a  place  for  the  storing  of  machine 
readable  material  the  DDA  engages  itself  in 
national  and  international  cooperation  on 
software  development  in  the  field  of  history, 
standardization,  computer  usage  in  research  etc. 

The  DDA  has  followed  the  workshop  series  on 
standardization  and  documentation  known  as  the 
Thaller  group  from  the  beginning  in  Gbttingen, 
1985,  over  Graz.  1986,  Paris,  1987  and  we  will 
also  be  present  in  KOln  1988.    DDA  has  a 
member  in  a  subgroup  of  this  line  of 
conferences  working  on  the  standardization  of 
data  description  in  the  field  of  history. 

The  DDA  also  has  one  of  the  two  Danish 
members  of  the  Nordic  Demographic  Data  base 
Working  Group,  which  is  a  group  of  six  from 
Denmark,  Norway  and  Sweden,  and  is 
responsible  for  the  organization  of  a  triannual 
Nordic  conference  on  computing  in  history,  and 
DDA  has  a  member  of  council  in  the 
Association  for  History  and  Computing.    Besides 
the  DDA  is  in  close  contact  with  the  Historical 
Workstation  developers  group. 


The  studies  cover  very  different  subjects  and 
research  methods.    Among  the  more  well-known 
of  our  studies  are  the  ones  produced  by  Hans 
Christian  Johansen:  DDA-0038:  Sound  Traffic, 
1784-1795,  based  on  the  customs  register  from 
the  Sound,  DDA-0106:  Reconstituted  Families 
in  Selected  Rural  Parishes.  1741-1801, 
DDA-0230:  Urban  Population  of  18th  Century 
Denmark,  and  DDA-0778:  Danish  Economic 
Statistics  1814-1980.    International  Co-operation 


"Rapport  fra  Referencegruppemode  1,  April 

1973. 

"'DDA-1066.    Further  information  on  this 

Project  can  be  found  in  H.J.  Marker:  Danish 
rices  and  Wages  and  the  Micro  Computer,  in 
Deian  Hopkin  and  Peter  Denley  (ed):  History 
and  Computing,  Manchester  1987,  pp.  89  -  95 
'Ter  Nielsen:  Beretning  for  DDA  i  finansaret 
1985,  DDA-NYT  no  36,  1986,  p.  25 


The  Historians  and  the  Data  Archives 

It  is  not  entirely  clear  which  role  the  historians 
would  like  the  data  archives  to  play.    Although 
some  organizing  is  taking  place  among  the 
historians  within  the  framework  of  international 
bodies  like  the  Association  of  History  and 
Computing,  the  historians  are  clearly  not 
speaking  with  one  voice  as  yet 

The  quantitative  historians  are  usually  quite 
happy  with  archiving  their  data  materials  at  a 
social  science  data  archive,  while  the  historians 
of  nonnumeric  persuasion  are  less  convinced 
that  the  data  archive  staff  possesses  the 
expertise  needed  to  deal  with  their  data. 
Unfortunately  there  is  quite  some  animosity 


Spring   1989 


38  - 


iassisl   quarterly 


between  the  historians  of  the  quantitative  and 
the  nonnumeric  approach  in  some  countries. 

The  diversity  among  the  historians  means  that 
in  some  countries  it  will  not  be  without 
difficulties  to  establish  the  social  science  data 
archive  with  its  clearly  quantitative  tradition  as 
a  generally  accepted  depository  for  historical 
research  materials. 


The  Future? 

It  seems  clear  to  me,  that  the  national  social 
data  archives  and  the  funding  authorities  behind 
them  have  to  take  a  stand  soon  on  whether  or 
not  they  would  like  to  see  a  network  of 
institutions  relatively  similar  to  but  separated 
from  the  social  science  data  archives  dealing 
with  historical  data  or  they  would  rather  have 
general  data  archives  for  social  science  and 
history". 

In  my  view  a  great  number  of  historians  would 
be  better  served  by  having  the  expertise  and  the 
data  needed  for  tfieir  purposes  at  the  same 
place,  rather  than  having  some  of  it  at  the 
historical  data  archive  and  some  of  it  at  the 
social  science  data  archive.    Some  data  materials 
are  of  interest  to  historians  and  social  scientists 
alike,  and  it  would  be  most  practical  for  the 
researcher  wanting  information  on  a  specific 
data  material  to  know  that  this  expertise  was  to 
be  found  within  one  institution  rather  than  in 
one  institution  for  this  data  material  and  in 
another  for  that 


Also  an  institution  dealing  with  history  as  well 
as  social  science  would  have  to  be  larger  than 
institutions  dealing  with  only  one  of  the 
sciences.    This  would  be  of  advantage  in  most 
countries  as  it  would  create  possibilities  for  a 
better  scientific  setting.    My  personal  hope 
would  be  that  the  enlarged  national  data 
archives  dealing  with  social  science  and  history 
alike,  would  act  as  power  houses  for 
developments  like  historical  informatics. 

If  a  solution  with  separate  data  archives  for 
history  and  social  science  is  preferred,  a  number 
of  routine  tasks  and  a  great  deal  of 
development  of  methodology  and  software  tools 
would  be  exactly  alike  in  the  two  data  archives. 
Coordination  of  the  efTorts  would  result  in 
savings  in  time  and  effort    The  same  would 
probably  hold  true  for  administration.    The 
possibilities  for  saving  time  and  money  should 
appeal  to  the  funding  authorities. 

Whether  or  not  the  social  science  data  archives 
should  want  to  be  increased  with  a  historical 
branch  is  another  matter.    To  some  extent  it  is 
a  matter  of  personal  preferences:  does  one 
prefer  a  wider  range  of  activities  over  larger 
degree  of  homogeniety.n 


"In  some  countries  this  problem  does  not  exist 
in  the  literal  sense  as  history  is  regarded  one  of 
the  social  sciences,  while  in  other  countries 
history  belongs  to  the  humanities.    Anyway  the 
problems  do  exist  when  it  comes  to  histoncal 
data  materials  which  are  clearly  out  of  line  with 
the  material  usually  archived  at  a  social  science 
data  archive. 


Spring   1989 


iassist   quarterly  —   39 


Small-Area  Census  Data 
Services  by  Microcomputer: 

Applications  of  REDATAM  System  in 
Latin  America 


by  Arthur  Conning  and  Ari  Silva' 

CELADE,  Casilla  91,  Santiago,  Chile 

and 

Lawrence  Finnegan 

US  Bureau  of  the  Census 

Washington,  D.C.,  U.S.A. 


Abstract 

A  study  in  1983  found  that  census  information  for  specific  small  geographical  areas  was  often  not 
available  from  the  population  and  housing  censuses  in  the  Latin  American  and  Caribbean  countries 
because  the  political  and  administrative  boundaries  used  in  the  census  frequently  do  not  correspond 
to  the  particular  areas  of  interest  and  because  most  statistical  offices  were  not  able  and/or  willing  to 
reprocess  the  large  census  files  rapidly  and  at  low  cost  on  their  mainframe  computers.    The 
interactive  REDATAM^  system,  in  English  and  Spanish  versions,  was  created  to  solve  the  problem  of 
providing  small-area  population  and  housing  information  by  using  an  IBM  or  fully  compatible 
microcomputer  to  store  the  microdata  of  an  entire  census  on  a  hard  disk  (or  laser  disks  for  larger 
countries)  and  to  permit  any  tabulation  to  be  produced  rapidly  for  any  area  down  to  city  blocks  or 
smaller. 

The  census  (or  survey)  data  is  stored  in  compressed  form  (approximately  our  fourth  of  the  original 
space  requirements)  in  a  database  that  makes  it  possible  to  access  the  data  directly  for  a  given  small 


'Presented  at  the  International  Association  for  Social  Science  Information  Service  and  Technology 
(IASSIST)  Conference  held  in  Washington,  D.C.,  U.S.A.  on  May  26-29.  1988 
^REDATAM  -  REtrieval  of  DATa  for  small  Area  by  Microcomputer. 


Spring   1989 


40  —  .  iassist   quarterly 


area  without  having  to  process  the  remainder  of  the  data.    Version  3.1  is  presently  available  in 
English  and  Spanish  with  associated  User  and  Database  Generation  manuals. 

Facilities  offered  by  the  system  include  geographic  selection,  grouping  of  geographic  areas, 
self-documented  databases,  interactive  and  batch  processing,  calculation  of  derived  variables,  use  of 
weighting  factors,  hierarchical  processing,  generation  of  sub-databases,  production  of  files  for  export 
to  other  packages  and  password  protection. 

REDATAM  databases  have  been  installed  for  1980-round  census  data  in  Chile,  Saint  Lucia,  Costa 
Rica,  Uruguay,  Dominica  and  Colombia  and  for  survey  data  in  Guyana.    The  processing  efficiency  of 
REDATAM  makes  it  possible  for  the  small  Caribbean  countries  to  use  REDATAM  for  processing  at 
the  national  level.    In  all  the  countries,  hard  disks  have  been  used  to  store  the  databases,  except  in 
the  case  of  Chile  where  the  16  million  records  are  stored  on  optical  "WORM"  laser  disks  which 
permit  writing  data  once.    The  use  of  hard  disk  or  optical  disk  is  essentially  "transparent"  to  the 
user. 

REDATAM  may  play  an  important  role  in  the  1990  censuses  in  the  Latin  American  and  Caribbean 
countries  since  the  system  will  permit  the  provision  of  timely  small-area  services  (and  at  the  national 
level  in  the  Caribbean  countries)  before,  as  well  as  after,  the  regular  data  processing  and  publication 
of  results  are  ready.    This  should  greatly  increase  the  use  of  the  census  data  by  both  the 
governmental  and  private  sectors,  but  will  require  some  improvements  in  the  data  collection  process 
and  cartography  to  ensure  the  quality  and  convenience  of  using  the  small-area  information. 

A  new  two-year  project  will  develop  an  extended  system,  to  be  broken  as  REDATAM+,  that  will 
permit  the  cartographic  display  and  analysis  of  population  and  other  information  through  an  interface 
with  a  Geographical  Information  System  (GIS)  and  will  associate  multidisciplinary  information 
describing  geographical  areas  with  multi-level  population  and  housing  data. 


Introduction 

REDATAM  is  a  microcomputer  software  system  for  obtaining  tabulations  and  other  statistics  for 
specific  geographical  areas  rapidly  and  at  low  cost  from  large  files  of  population  or  other  data. 
Although  the  system  may  be  used  to  process  survey,  vital  statistics  and  other  similar  quantitative 
information  to  take  advantage  of  its  high-speed  processing  capabilities,  REDATAM  is  oriented 
primarily  to  more  massive  census  datasets.    It  is  designed  to  store  all  the  original  microdata  (i.e.,  the 
values  of  each  variable  of  each  individual  person  by  person)  of  an  entire  population  and  housing 
census  on  an  ordinary  microcomputer  and  to  allow  users  to  obtain  tables  with  any  of  the  variables 
for  any  small-area  within  the  country  down  to  city  blocks,  normally  within  minutes  and  without 
special  programming  assistance. 

While  the  microcomputing-based  REDATAM  system  is  not  designed  to  be  used  for  obtaining 
tabulations  forr  an  entire  census  in  the  medium  and  larger  countries  of  Latin  America,  it  does  serve 
this  purpose  for  small  countries  in  the  Caribbean.    It  should  be  noted,  none  the  less,  that  frequently 


Spring   1989 


iassist   quarterly  —   41 


the  real  time  required  to  receive  a  REDATAM  tabulation  for  a  large  city  of  a  few  million  persons 
may  be  much  less  than  the  real  time  to  obtain  the  same  information  with  a  mainframe  in  a  nat:or^ 
statistical  office,  since  the  microcomputer  can  be  left  running  overnight  whereas  the  mainframe  ao:<: 
is  often  delayed  due  to  the  need  for  a  programmer  and  tape  manipulation. 


The  Demand  for  Geographically  Disaggregated:  Census  Data 

In  1983,  the  United  Nations  Latin  American  Demographic  Centre.  CELADE',  carried  out  a  stud/  in 
seven  countries  in  the  Latin  American  and  Caribbean  region  to  determine  the  types  of  requests  for 
numerical  population  data  which  the  national  statistical  offices  receive  from  the  public  and  private 
sectors  and  which  the  offices  have  difficulty  in  answering.    The  countries  selected  covered  difTerent 
situations  with  respect  to  physical  and  population  size  of  country,  language,  cultural  background, 
computer  facilities  and  experience  of  the  statistical  office,  etc.    The  countries  were  (1983  estimated 
populations  in  parentheses):  Saint  Lucia  (125,000),  Trinidad  and  Tobago  (1.2  million),  Costa  Rica  (2.5 
million),  Bolivia  (6  million),  Chile  (11.6  million),  Peru  (1.87  million)  and  the  Brazilian  state  of  San 
Paulo  (23  million). 

While  the  findings  of  the  study  covered  a  wide  range  of  topics  (see  Conning,  1983,  for  a  detailed 
report),  the  major  population  data  supply  problem  faced  by  all  the  national  statistical  offices  visited 
was  that  of  fiilfilling  special  requests  for  data  for  specific  geographic  areas,  usually  from  the 
population  and  housing  census.    For  example,  the  Trinidad  and  Tobago  agency.  Town  and  Country 
Planning,  with  responsibility  for  rationalizing  land  use  in  priority  areas  of  the  island,  had  to  wait  for 
many  months  for  the  special  tabulations  that  it  requested. 

The  findings  of  this  study  helped  identify  a  major  data  supply  problem  that  can  be  defined  in  the 
following  terms: 

a)  The  development  of  new  projects  and  programmes  and  the  improvement  of  social  services 
normally  requires  information  on,  for  example,  the  characteristics  and  spatial  distribution  of  the 
labour  supply  and  the  population  that  will  be  benefited  or  otherwise  aflfected  by  the  action. 

b)  The  population  and  housing  censuses  in  most  developing  countries  are  the  only  source  of  existing 
data  that  has  a  large  enough  number  of  cases  to  permit  useful  tables  to  be  obtained  for  small 
geographical  areas,  but  the  tabulated  information  is  normally  only  available  for  administrative  and 
political  boundaries  that  frequently  do  not  correspond  to  the  particular  areas  of  interest  to  users; 

c)  Statistical  offices  dependent  on  large  computers  and  programmers  for  working  with  census  data 
cannot  reprocess  census  data  rapidly  and  at  low  cost  to  obtain  small-area  tables,  because  the 
census  data  files  are  very  large,  are  conventionally  organized  and  processed  and  because  the 
offices  usually  give  higher  priority  to  their  own  regular  activities  than  to  requests  for  special 


'CELADE  is  part  of  the  system  of  the  Economic  Commission  for  Latin  .America  and  the  Caribbean 

(ECLAC). 

financed  by  the  International  Development  Research  Centre  (IDRC)  of  Canada. 


Spring   1989 


42  —  iassist   quarterly 


tabulations  by  other  agencies  and  private  organizations. 

By  1983-84,  it  was  already  clear  that  the  solution  resided  in  the  use  of  standard  microcomputers  that 
were  just  being  introduced  on  a  wide  scale  in  the  Latin  America  and  Caribbean  region. 


The  Development  of  REDATAM 

As  there  did  not  appear  to  be  any  low-cost  commercial  or  other  software  available  that  was  able  to 
store  large  population  and  housing  census  files  (usually  with  many  millions  of  records),  rapidly  locate 
the  data  for  the  geographical  areas  of  interest  and  efficiently  process  extensive  quantities  of  data  to 
give  results  for  most  requests  in  minutes  or  at  most  tens  of  minutes,  CELADE  began  to  develop  the 
new  system,  REDATAM,  in  June  1974*  The  system  is  written  in  the  "C"  language. 

The  basic  concepts  underlying  the  operation  of  REDATAM  may  be  understood  by  visualizing  the 
"data  matrix"  for  a  simple  census  (an  actual  database  is  usually  more  complex).    The  record  for  each 
person  occupies  a  row  of  the  matrix;  for  each  person  there  is  a  set  of  variables,  e.g.,  age,  sex, 
education,  occupation,  place  of  birth,  etc.    Each  variable  defines  a  matrix  column.    (See  appendex  A.) 

Normally,  to  make  a  tabulation  of  "Age  by  Bthplace"  for  all  the  persons,  the  computer  reads  the 
data  of  each  person  sequentially  and  uses  the  age  and  birthplace  data  found  for  each  person  to 
derive  the  table.    The  ordering  of  the  persons  in  the  data  matrix  is  irrelevant    However,  since 
REDATAM  is  directed  towards  obtaining  tabulations  for  a  user-defined  geographical  area,  such  as 
the  municipio  de  La  Florinda,  it  is  convenient  to  order  the  persons  by  the  code  of  the  area  in  the 
REDATAM  database,  that  is,  the  person  records  are  sorted  beforehand  on  the  geographical 
identification  so  that  all  the  persons  in  the  municipio  of  La  Florinda  are  together.    REDATAM  thus 
can  ignore  all  the  rest  of  the  data  and  jump  immediately  to  the  persons  in  La  Florinda. 

The  computer  then  could  read  the  data  for  all  variables  on  each  person  within  La  Florinda  to  pick 
out  the  information  on  Age  and  Birthplace  to  create  the  table  of  interest    But  this  is  inefficient  since 
time  is  wasted  reading  all  the  unused  variables  as  sex,  education,  etc.,  which  are  of  no  interest  in 
this  tabulation.    For  this  reason,  the  REDATAM  database  is  also  structured  using  an  "inverted  file" 
(data  is  stored  by  variable  rather  than  by  person  record)  so  that  only  the  variables  of  interest  need 
be  accessed  for  the  persons  in  the  area  selected. 

Since  a  very  large  number  of  records  must  be  stored,  REDATAM  compresses  the  information  and 
eliminates  redundant  data  making  the  final  REDATAM  databases  around  one-fourth  the  size  of  the 
original  microdata.    Depending  on  the  number  of  variables  in  the  census,  it  is  sometimes  possible  to 
place  the  census  data  of  up  to  a  million  persons  on  a  20  megabyte  hard  disk. 


*A  grant  was  provided  by  the  International  Development  Centre  (IDRC)  of  Canada  for  the 
REDATAM  project    Work  on  REDATAM  and  its  installation  in  various  countries  also  receives 
continuing  support  from  the  United  Nations  Population  Fund  (UNFPA)  and  the  Canadian 
International  Development  Agency  (CIDA). 

Spring   1989 


iassist   quarterly 


-    43 


In  practice,  the  REDATAM  database  is  normally  constructed  with  hierarchically-linked  housing  and 
population  data  to  permit  the  production  of  tables  that  treat  either  population  or  housing  or  both. 
This  makes  it  possible  to  study,  for  example,  the  housing  characteristics  of  immigrants. 

It  is  important  to  note  that,  while  REDATAM  is  designed  to  be  used  without  programmer  assistance 
for  obtaining  tabulations  and  other  results,  the  one-time  creation  of  a  REDATAM  census  database  is 
complex  and  requires  a  proficient  programmer.    The  census  data  usually  is  first  cleaned  to  eliminate 
logical  inconsistencies,  must  be  soned  by  geography  and  then  transmitted  ("downloaded")  from  the 
mainframe  environment  to  the  microcomputer.    Information  on  each  variable  must  be  placed  in  the 
REDATAM  dictionary  and  the  entire  geographical  hierarchy  of  names  and  codes  must  be  entered 
and  related  to  the  census  data  and  finally  the  REDATAM  database  must  be  generated.    While  all 
this  can  be  done  relatively  easily  with  the  REDATAM  software  for  survey  data  already  on  the 
microcomputer,  the  sheer  magnitude  of  most  census  data  files  inuoduces  its  own  complications. 
Nevertheless,  once  the  REDATAM  database  is  created,  its  use  is  equally  simple  whatever  the 
complexity  involved  in  its  creation. 

The  system  can  employ  either  a  hard  disk  for  the  storage  of  the  databases  from  small  and 
medium-sized  countries  or  an  optical  "WORM"  (Write  Once  Read  Many  times)  laser  disk  for  the 
data  of  large  countries.    The  storage  medium  is  transparent  to  the  user.    As  there  is  normally  no 
reason  to  change  the  database,  the  "write  once"  limitation  of  present  laser  disks  merely  improves 
sectmty. 


Characteristics  of  the  REDATAM  Software 

Summary  of  user-  oriented  features 

Full  deuils  on  the  facilities  for  data  manipulation,  statistical  processing,  geographic  selection,  etc.,  can 
be  obtained  by  consulting  the  User's  Manual  (1987a;  1988a).    Information  on  database  generation  is 
provided  in  the  corresponding  manuals  (1987b;  1988a).    The  list  that  follows  outlines  the  most  salient 
features  of  REDATAM  version  3.1  from  the  point  of  view  of  a  user. 

English  language  and  Spanish  language  versions:  The  REDATAM  software  and  all  manuals  are 
available  in  separate  English-  and  Spanish-language  versions.    The  tutorial  and  examples  in  the 
manuals  refer  to  a  small  demonstration  database  that  comes  with  the  software.    The  system  is 
designed  to  permit  the  easy  inclusion  of  other  languages  for  all  screens  and  help. 

Geographic  selection:  The  user  defines  the  universe  of  cases  to  be  processed  by  selecting  the 
specific  geographic  areas  of  interest,  which  in  some  databases  can  be  created  from  city  blocks  or 
smaller.    Maps  usually  must  be  consulted  when  census-defined  areas  without  names  are  involved. 

Self-documented  database:  There  is  a  complete  dictionary  of  variable  names,  descriptions  of 
categories,  etc.    Similar  information  can  also  be  maintained  for  receded  or  derived  variables. 


Spring   1989 


44  —  ■  iassist   quarterly 


Interactive:  The  user  interacts  with  the  system  through  menus  and  other  faciHties  and  there  is 
extensive  context-sensitive  help.    A  "batch  processing"  mode  also  exists  so  that  various  long 
processes  with  millions  of  cases  can  be  left  overnight  without  user  intervention. 

Calculation  of  derived  variables:  New  variables  may  be  defined  by  recoding  and  through  the 
utilization  of  arithmetic  operations.    The  new  variables  may  be  temporary  or  may  be  incorporated 
into  the  database. 

Statistical  results:  As  REDATAM  is  directed  to  normal  census  processing,  three  basic  statistics  can 
be  produced:  frequencies,  cross-tabulations  and  averages,  the  latter  two  with  up  to  four  variables 
in  a  given  table.    Results  can  be  requested  for  sub-areas  within  the  zone  of  interest,  as  well  as 
for  the  entire  zone.    Decimal  values  can  be  used  if  non-integer  weights  are  employed,  for 
example,  to  expand  a  census  sample  or  a  survey. 

Hierarchical  processing:  The  system  works  with  two  levels  of  variable  (housing  and  population) 
and  results  can  be  obtained  for  each  of  the  levels  separately  or  combined.    For  example,  the 
number  of  foreign-bom  persons  by  sex  and  age  within  households  can  be  calculated  and 
cross-tabulated  with  a  housing  quality  indicator. 

Generation  of  sub-databases:  A  REDATAM  sub-databases  for  a  specific  area  and/or  for  a 
specific  sub-set  of  variables  can  be  downloaded  for  utilization  on  another  machine  (which  may 
have  less  storage). 

Data  files  for  export:  When  the  basic  statistical  operations  in  REDATAM  are  not  sufficient,  an 
appropriately  formatted  parameter  file  and  the  dataset  for  the  variables  of  interest  for  the  selected 
area  can  be  created  for  immediate  processing  by  SPSS  or  SL-MICRO. 

Storage  and  printing  of  results:  The  statistical  results  can  be  kept  for  later  printing  or  inclusion  in 
a  document,  as  well  as  for  the  production  of  reformatted  output  using  other  packages  such  as 
WordPerfect,  Wordstar,  Lotus,  Mathplan,  etc. 

Protection  of  information:  Since  it  takes  a  user  only  a  minute  or  two  to  define  the  area  and 
obtain  tables  on  the  persons  living  on  a  city  block  or  smaller,  REDATAM  permits  the  assigning 
of  various  levels  of  protection  to  each  database  via  passwords  to  prevent  unauthorized  use  at  very 
disaggregated  levels  or  when  the  number  of  cases  is  very  small  in  a  single  table. 

Equipment  required 

The  following  equipment,  easily  available  in  the  Latin  American  and  Caribbean  countries,  is  required 
for  operating  REDATAM: 

L  IBM  PC  with  a  hard  disk,  an  XT,  AT  or  386  or  a  fully  compatible  microcomputer. 

2.  640  K  main  memory. 

3.  At  least  one  floppy  disk. 

4.  Monochromatic  or  colour  monitor. 


Spring   1989 


[assist   quarterly  —   45 


5.  Printer  with  at  least  80  columns. 

6.  Operating  system  PC-DOS  version  2.0  or  higher. 

7.  A  hard  disk  with  approximately  1.4  megabytes  (Mb)  available  is  required  for  the  REDATAM 
system  and  the  demonstration  database  that  comes  with  it  The  total  amount  of  hard  disk  (or 
laser  optical  disk)  space  for  storing  the  actual  census  or  survey  of  interest  will  depend  on  the 
number  of  persons  enumerated  and  the  number  and  size  of  the  variables  on  the  questionnaire. 


Availability  of  the  software 

Demonstration  copies  of  the  REDATAM  software,  version  3.1,  and  the  accompanying  manuals  and 
test  database  may  be  obtained  in  English  or  Spanish  by  writing  to  CELADE,  Casilla  91,  Santiago, 
Chile. 


Experience  with  REDATAM  in  the  Latin  American  and  Caribbean  Region 

The  initial  REDATAM  version  2.0  was  operationally  tested  on  the  1980-round  population  and 
housing  census  data  in  the  national  statistical  offices  of  Chile  and  Saint  Lucia,  respectively,  for 
around  mid-1987.    In  addition  to  the  unforeseen  demonstration  efTect  that  stimulated  interest  in  other 
countries  before  the  system  was  publicized,  the  tests  helped  to  determine  enhancements  and 
modifications  that  have  been  incorporated  into  the  present  version  3.1  (released  for  general 
distribution  in  May  1988)  and  to  identify  the  major  extensions  that  will  be  developed  over  the  next 
two  years  (see  below). 

Table  1  lists  the  census  databases  installed  to  date.    The  database  for  the  approximately  16  million 
records  of  Chile  are  stored  on  three  "WORM"  laser  disks  of  around  115mb  each,  corresponding  to 
all  regions  north  of  Santiago,  the  metropolitan  area  of  Santiago,  and  all  regions  south  of  Santiago. 
Since  a  laser  disk  drive  was  not  available  in  Colombia  and  a  sufficiendy  large  hard  disk  was  not 
accessible  when  the  databases  were  created  (March  1988),  a  separate  database  was  generated  for  each 
of  the  six  regions  of  the  country  and  for  Bogata.    In  the  case  of  Uruguay,  the  REDATAM  database 
was  made  (August  1987)  for  a  15  percent  sample  of  the  census  to  permit  working  with  the 
information  while  the  data  entry  is  being  completed  for  the  entire  census.    Costa  Rica  has  also 
created  databases  for  household  surveys  and  for  birth  and  death  vital  statistics. 

Census  databases  have  been  generated  for  the  Caribbean  coimtries  of  Grenada,  the  British  Virgin 
Islands  and  St.    Vincent,  but  have  not  yet  been  installed  in  the  countries.    A  REDATAM  database 
has  also  been  created  for  the  state  of  Rondonia,  Brazil,  to  illustrate  REDATAM  to  the  national 
statistical  office  (IBGE)  for  possible  use  in  the  1990  Brazilian  census.    The  Brazilian  authorities  are 
considering  the  utlization  of  REDATAM  to  obtain  rapid  results  from  the  1990  census  pilot  tests  and 
are  looking  into  the  mass  distribution  of  public-use  1990  census  data  in  the  form  of  REDATAM 
databases  that  would  be  shipped  along  with  a  copy  of  REDATAM  to  permit  users  to  work  on  their 
own  microcomputers. 


Spring   1989 


46  —  -  iassist   quarterly 


The  REDATAM  databases  created  to  date  are  all  installed  in  the  national  statistical  offices  of  the 
respective  countries.    The  statistical  offices  use  the  system  for  their  own  purposes  and  each  also 
provides  tabulations  for  specific  areas  on  requests  to  other  governmental  agencies,  universities  and  to 
the  private  sector.    The  Colombian  national  statistical  office,  DANE,  has  indicated  that  as  part  of  its 
policy  to  decentralize  information,  it  will  provide  each  of  its  regional  offices  with  its  conesponding 
REDATAM  database  of  the  1985  census.    The  use  of  passwords  provided  by  the  system  will  help 
protect  the  confidentiality  of  the  data  in  these  situations.    (See  Table  1,  Appendix  B.) 

Although  the  design  of  REDATAM  is  explicitly  directed  to  the  problem  of  rapidly  obtaining  tables 
for  specific  areas  selected  from  a  database  of  a  population  and  housing  census  database  of  countries 
with  various  millions  of  inhabitants,  the  processing  efficiency  of  the  system  has  also  proven  to  be 
very  useful  for  routinely  tabulating  the  entire  censuses  of  Saint  Lucia  and  Dominica  with  125,000  and 
74,000  population,  respectively.    Since  these  countries  do  not  have  mainframe  or  minicomputer 
facilities  for  census  processing,  their  1980-round  census  data  were  elaborated  in  a  regional  center  in 
Barbados,  along  with  the  data  of  most  of  the  other  smaller  Caribbean  countries.    Until  REDATAM 
was  installed  these  countries  had  to  rely  on  the  pre-conceived  printed  tabulations  made  for  them  of 
the  whole  country  and  of  administrative  zones  which  often  do  not  correspond  to  the  area  of  interest 
for  individual  users. 

The  experience  gained  to  date  in  the  Latin  American  and  Caribbean  countries  shows  that  the 
REDATAM  software  normally  is  first  used  by  the  national  statistical  office,  which  after  creating  the 
census  database,  provides  small-area  data  services  (see  CELADE,  1987c).    Hence,  most  users  of 
census  data  in  the  countries  have  not,  themselves,  utilized  REDATAM,  but  request  the  tables  of 
interest  for  areas  defined  by  them  from  their  national  statistical  office. 

However,  since  REDATAM  permits  "downloading"  a  sub-REDATAM  database  for  an  area  of 
interest  allowing  the  user  to  work  intensively  with  the  data  on  his  or  her  own  microcomputer  (if  the 
statistical  office  is  willing  to  give  out  the  data),  it  is  likely  that  there  will  be  some  diffusion  of  the 
system  to  other  agencies.    In  addition,  some  institutions  have  begun  to  use  REDATAM  with  their 
own  survey  or  other  data  when,  for  example,  processing  speed  is  important  and  the  features  of  a 
much  more  complete,  but  slower,  system  such  as  SPSS,  are  not  required.    As  REDATAM  can  expon 
the  data  and  parameter  cards  to  SPSS,  once  the  REDATAM  database  is  created,  SPSS  can  be  utilized 
when  required. 

Discussions  have  also  been  held  with  United  Nations  authorities  in  New  York  for  arranging  the 
distribution  of  REDATAM  to  countries  in  Africa,  Asia  and  the  Arab  world.    It  is  not  known  whether 
there  has  been  any  actual  use  of  the  system  outside  Latin  America  and  the  Caribbean  except  that 
the  demonstration  REDATAM  software  has  been  installed  without  modification  in  the  national 
statistical  office  in  China  on  a  Great  Wall  microcomputer. 


Spring   1989 


iassist   quarterly  —   47 


REDATAM  and  the  1990  Censuses 

Most  governmental  and  private  organizations  concerned  with  the  development  of  policy  or  the 
formulation  of  plans  and  projects  need  timely  data  along  with  geographical  disaggregation.    But  in 
the  past,  including  the  1980-round  of  censuses,  many  Latin  American  and  Caribbean  countries  have 
been  unable  to  publish  their  volumes  of  census  tables  until  years  after  the  date  of  collection,  making 
the  very  expensively  collected  census  data  more  historical  than  timely. 

However,  once  their  1990  census  data  are  entered  into  the  computer  (all  the  data  or  a  sample),  the 
national  statistical  offices  can  use  REDATAM  to  provide  small-area  data  services  on  request  (and 
national  services  in  the  case  of  the  Caribbean  countries),  long  before  the  census  publications  are 
produced  or  ready  for  distribution.    The  country  may  prefer  to  make  these  services  available  to  the 
general  public  only  after  the  data  have  been  edited  to  remove  logical  inconsistencies,  but  this  still 
will  be  months  or  more  before  the  tabulations  are  produced  and  published.    Statistical  office  staff 
from  a  number  of  countries  have  suggested  that  REDATAM  may  permit  them  to  publish  fewer 
tables. 

As  the  1990  census  data  will  be  timely  and  can  be  tailored  to  each  user's  request,  REDATAM  will 
make  it  possible  for  the  first  time  for  the  national  statistical  offices  in  Latin  America  and  the 
Caribbean  to  provide  the  private  sector  as  well  as  governmental  agencies  with  easy  access  to 
small-area  census  data  for  their  own  purposes  (see  ECLAC,  1987,  for  the  1990  census-  and 
REDATAM-related  discussions  of  the  Directors  of  the  Statistical  Offices  of  the  Americas).    Needless 
to  say,  the  availability  of  more  disaggregated  information  will  require  better  primary  control  over  the 
data  collection  process  than  was  exercised  in  the  1980  censuses  and  improved  canography  (see 
CELADE,  1987d). 


REDATAM- PLUS:  Cartographic  display  with  a  Multidiscipiinary  Database 

CELADE  is  now  beginning  development  of  an  extended  version  of  the  system,  REDATAM+,  to 
enhance  the  usefulness  of  the  system  in  a  variety  of  circumstances  and  to  keep  it  abreast  of  changing 
technology  and  increasing  user  sophistication' 

The  specific  extension  of  the  REDATAM+  software  will  allow  it  to: 

a)  Retrieve  multidisciplinear>'  information  describing  geographical  areas  in  association  with  various 
levels  of  population  microdata; 

b)  Display  population,  housing  and  other  data  with  cartographic  information  and  carry  out  spatial 
analysis  through  an  interface  with  a  full- feature  geographic  information  system  (GIS); 

d)  Operate  within  a  network  to  permit  more  than  one  user  to  work  with  the  same  database;  and 


'A  grant  for  this  development  has  been  provided  by  IDRC. 
Spring   1989 


48  —  .  iassisl   quarterly 


d)  Produce  camera-ready  REDATAM  tables  for  publication. 

The  conversion  of  the  present  REDATAM  database  into  a  multidisciplinary  planning  database 
containing  timely  1990  census  data  and  the  interface  with  a  powerful  GIS  to  permit  geographical 
analysis,  are  likely  to  lead  many  planners  and  other  users  to  insist  on  their  utilizing  REDATAM 
themselves,  with  REDATAM  sub-databases  for  the  specific  regions,  cities  or  other  areas  of  interest 
The  REDATAM+  system,  thus,  will  be  one  means  of  increasing  and  extending  the  usefulness  and  life 
of  the  1990  census  data  that  will  be  collected  at  such  high  cost  in  the  Latin  American  and  Caribbean 
countries.    Of  course,  the  utilization  of  the  GIS  system  with  REDATAM+  will  be  possible  only  in 
situations  where  the  geographical  base  file  containing  the  cartographic  description  of  boundaries  and 
other  geographical  information  is  available  for  the  areas  of  interest  in  a  given  country. 

The  REDATAM+  software  should  be  ready  for  general  distribution  at  the  end  of  1989  (until  then, 
REDATAM  Version  3.1,  described  above,  will  be  available). 


Bibliography 

CELADE,  1987a.    "REDATAM  Version  2.00  User's  Manual".    CELADE,  Santiago.    English 

LC/DEM/G50  (24  June  1987).  [Also  available  in  Spanish]. 
,  1987b.    "REDATAM:  Database  generation  manual".    Santiago.    CELADE.    LC/DEM/G.53 

(October  1987).  [Also  in  Spanish]. 

1987c.    "Considerations  for  implementing  REDATAM  data  services".    Santiago,  CELADE. 


LC/DEM/R.49  (September  1987).  [In  Spanish,  in  a  slightly  different  version:  "Consideraciones 
para  implementar  un  servicio  de  datos  con  el  sistema  REDATAM".    Ref.  document  no.  7. 
Meeting  of  the  Directors  of  Statistics  of  Americas.    ECLAC,  Santiago,  23-25  September  1987.] 

1987d.    "The  relevance  of  REDATAM  system  for  the  1990  censuses"  Santiago,  CELADE. 


LC/DEM/48  (September  1987).  [In  Spanish,  in  a  slightly  different  version:  "REDATAM: 
Relevancia  para  los  censos  de  1990".  Ref.  document  no.  17.  Meeting  of  the  Directors  of 
Statistics  of  Americas.    ECLAC,  Santiago,  23-25  September  1987.] 

1987e.    "REDATAM:  A  summary",  Santiago,  CELADE.    LC/DEM/R.50  (SepL87).  [In 


Spanish,  in  a  slightly  different  version:  "REDATAM:  Un  resumen",  Ref  document  no.  18. 

Meeting  of  the  Directors  of  Statistics  of  Americas.    ECLAC,  Santiago,  23-25  September  1987.] 
CELADE,  1988a.    "Supplementary  Manual  for  REDATAM  Version  3.1:  Supplement  to  the  User 

Manual  and  the  Database  generation  manual".    Santiago,  CELADE.    Serie  A-181  (March  1988). 

[Also  in  Spanish]. 
Conning.  Arthur,  1983.    Report  to  IDRC  on  the  REDATAM  Pre-Project  Mission,  6-24  June  1983: 

An  examination  of  problems  encountered  by  national  users  in  the  retrieval  of  quantitative 

populationi  data  produced  by  Latin  American  and  Caribbean  statistical  offices.    CELADE, 

Santiago,  Chile. 
ECLAC,  1987.    Report  of  the  Meeting  of  Directors  of  Statistics  of  the  Americas,  ECLAC,  Santiago, 

23-25  September  1987.    ECLAC,  Santiago,  LC/G.1482  (24/11/87). 


Spring   1989 


iassist   quarterly 


-   49 


Appendix  A 

DATA  MATRIX 

VARIABLES  for  each  person 
Age           Sex           Educ           ...    Bthplace   ... 

Pers          1 
PERSONS                  2 

(in  geograph-               3 
ical  order)                   4 

Persons  in  municipio,  La  Florinda 

567,822 
Pers        567,823 

Spring   J  989 


50  - 


iassist   quarterly 


Appendix   B 

Table  1.  REDATAM  population 

and  housing  census  databases 

created  to  date  (April  1988) 

Country  and           Dwellings 

Persons 

REDATAM  space 

Storage 

date  of  the  census  (Thousands) 

(Thousands) 

(Megabytes) 

Methods 

Chile  (1982)                                    4  000 

12  000 

300.0 

Laser  disk 

Colombia  (1985) 

(Basicquesiionnaire  *)                      5  800 

27  800 

70.0 

Hard  disk 

Colombia  (Enlarged 

questionnaire  *)                               600 

2  800 

60.0 

Hard  disk 

Costa  Rica  (1984)                               500 

2  500 

60.0 

Hard  disk 

Uruguay  (1985)  (sample)                     147 

450 

15.0 

Hard  disk 

Saint  Lucia  (1980)                                30 

125 

3.0 

Hard  disk 

Dominica  (1981)                                   17 

74 

2.5 

Hard  disk 

Guyana  (household  survey)                     8 

42 

1.0 

Hard  disk 

The  full  questionnaire  in  the  1985  Colombian  census  was  applied  to  10 

percent  of  the  population.  The  basic  questionnaire 

with  a  reduced  number 

of  variables  was  applied  to  the  entire  population. 

Spring    1989 


iassist   quarterly 


-    51 


IASSIST  1990 


CALL  FOR  PAPERS 
^^1  I 


IASSIST 

16th  Annual  Conference 

May  30  -  June  3, 1990 

Poughkeepsie,  New  York  USA 


Numbers,  Pictures,  Words  and  Sounds:  Priorities  for  the  1990's 


The  1990  IASSIST  conference  has  as  its  central  theme  "Numbers,  Pictures,  Words  and 
Sounds:  Priorities  for  the  1990's".  This  title  reflects  the  ever-expanding  universe  of  data 
types,  as  well  as  related  hardware  and  software  development.  The  program  will  consist  of 
presentations  on  a  wide  variety  of  topics.  The  Program  Committee  is  now  soliciting  contri- 
butions in  the  forms  of  papers,  proposals  for  panel  discussions,  roundtables,  poster  sessions 
and  workshops  to  be  presented  at  the  conference. 

All  papers  or  proposals  concerned  with  the  generation,  transfer,  retrieval,  storage  and  use  of 
machine-readable  social  science  data  will  be  considered.  Papers  which  discuss  issues  and 
technologies  related  to  non-numeric  data  are  particularly  encouraged. 


For  more  information  contact: 


Sarah  E.  Cox-Byrne 

Data  Archives 

Vassar  College  Library 

Box  20  Vassar  College 

Poughkeepsie,  NY  12601  USA 

e-mail:COXBYRNE@VASSAR.BITNET 


Laura  A.  Guy 

Data  and  Program  Library  Service 

3308  Social  Science  Building 

1180  Observatory  Drive 

Madison,  WI  53706  USA 

e-maiI:GUY@WISCMACC.BITNET 


The  International  Association  for  Social  Science  Information  Service  and  Technology  (IASSIST)  is  an  interna- 
tional association  of  individuals  who  are  engaged  in  the  acquisistion,  processing,  maintenance,  and  distribution 
of  machine  readable  text  and/or  numeric  social  science  data. 

Founded  inl974,  the  membership  includes  social  scientists,  data  archivists,  librarians,  information  specialists, 
researchers,  programmers,  planners  and  government  agency  administrators. 


Spring   1989 


lASSIST 


Membership 
form 


The  International  Association  for 

from  reduced  fees  for  attendance 

Social  Science  Information  Services 

at  regional  and  international 

and  Technology  (lASSIST)  is  an 

conferences  sponsored  by 

international  association  of 

lASSIST. 

individuals  who  are  engaged  in  the 

acquistion,  processing,  maintenance. 

Membership  fees  are: 

and  distribution  of  machine  readable 

Regular  Membership.  $20.00  per 

text  and  /or  numeric  social  science 

calendar  year. 

data.  The  membership  includes 

Student  Membership:  $10.00  per 

information  system  specialists,  data 

calendar  year. 

base  librarians  or  administrators. 

archivists,  researchers,  programmers. 

Institutional  subcriptions  to  the 

and  managers.  Their  range  of  interests 

quarterly  are  available,  but  do  not 

encompases  hard  copy  as  well  as 

confer  voting  rights  or  other 

machine  readable  data. 

membership  benefits. 

Paid-up  members  enjoy  voting  rights 

Institutional  Subcription:  $35.00 

and  receive  the  lASSIST 

per  calendar  year  (includes  one 

QUARTERLY.  They  also  benefit 

volume  of  the  Quarterly) 

.    I  would  like  to  become  a  member 
1    of  lASSIST.  Please  see  my  choice 
1    below: 



Please  make  checks 

1               $20  Regular  Membership 

payable  to  lASSIST  and 

1               $10  Student  Membership 

Mail  to  : 

.               $35  Institutional 

'               Membership 

Ms  Jackie  McGee 

1    My  primary  Interests  are: 

Treasurer,  lASSIST 

1               Archive  Services/Admini- 

%  Rand  Corporation 

1               stration 

1               Data  Processing/Data 

Management 
1               Research  Applications 

1700  Main  Street 
Santa  Monica 

1               Other  (specify) 

[   Name /phone 

1   Institutional  Affiliation 

1   Mailing  Address 

I   City 

1 

|_Country /zip/postal  code 

J 

r 


ions)ienee= 


•  CURRENT  RESEARCH  is  an  international  quarterly  journal 
offering  a  unique  current  awareness  service  on  research 
and  development  work  in  library  and  information  science, 
archives,  documentation  and  the  information  aspects  of 
other  fields 

#  The  journal  provides  information  about  a  wide  range  of 
projects,  from  expert  systems  to  local  user  surveys.  FLA 
and  doctoral  theses,  post-doctoral  and  research-staff  work 
are  included 

•  Each  entry  provides  a  complete  overview  of  the  project, 
the  personnel  involved,  duration,  funding,  references,  a 
brief  description  and  a  contact  name.  Full  name  and 
subject  indexes  are  included 

#  Other  features  include  a  list  of  student  theses  and 
dissertations  and  a  list  of  funding  bodies.  Each  quarter,  an 
area  of  research  is  highlighted  in  a  short  article 


CURRENT  RESEARCH  is  available  on  magnetic  tape,  as  well 
as  hard  copy,  and  can  be  searched  online  on  File  61  (SF=CR) 
of  DIALOG 


Subscription:  UK  £86.00 

Overseas  (excluding  N.  America)  £103.00 

N.  America  US$195.00 

Write  for  a  free  specimen  copy  to 

Sales  Department 

Library  Association  Publishing 

7  Ridgmount  Street 
London  WC1E7AE 
Tel:  01  636  7543x360 


Library  &  Information 
Science  Abstracts 


international  scope  and  unrivalled  coverage 

LISA  provides  English-language  abstracts  of  material  in  over 
thirty  languages.  Its  serial  coverage  is  unrivalled;  550  titles 
from  60  countries  are  regularly  included  and  new  titles  are 
frequently  added 

rapidly  expanding  service  which  keeps  pace  with 
developments 

LISA  is  now  available  monthly  to  provide  a  faster-breaking 
service  which  keeps  the  user  informed  of  the  rapid  changes  in 
this  field 


#    extensive  range  of  non-serial  works 

including      British      Library     Research 
Department       reports,       conference 
monographs 


and     Development 
proceedings       and 


wide  subject  span 

from    special    collections    and    union    catalogues    to 
processing  and  videotex,  publishing  and  reprography 


word 


#  full  name  and  subject  indexes  provided  in  each  issue 

abstracts  are  chain-indexed  to  facilitate  highly  specific  subject 
searches 

#  available  in  magnetic  tape,  conventional  hard-copy  format, 
online  (Dialog  file  61)  and  now  on  CD-ROM 

Twelve  monthly  issues  and  annual  index 


Subscription:  UK  £157.00 

Overseas  (excluding  N.  America)  £188.00 

N.  America  US$357.00 

Write  for  a  free  specimen  copy  to 

Sales  Department 

Library  Association  Publishing 

7  Ridgmount  Street 
London  WC1E  7AE 
Tel:  01  636  7543x360 


Fall    1988 


LfBRA 


g\   i 


SEP  12    !989 

Scfiool  of  Library  5e!s;-rta 
University  of  Nortti  0»r{>R;.i» 


